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Abstract 

Enhanced  reality  visualization  is  the  process  of  enhancing  an  image  by  adding 
to  it  information  which  is  not  present  in  the  original  image.  A  wide  variety  of 
information  can  be  added  to  an  image  ranging  from  hidden  lines  or  siirfaces  to 
textual  or  iconic  data  about  a  particular  part  of  the  image.  Enhanced  reality 
visualization  is  partictilarly  well  suited  to  neurosurgery.  By  rendering  brain 
structures  which  are  not  visible,  at  the  correct  location  in  an  image  of  a  patient’s 
head,  the  surgeon  is  essentially  provided  with  X-ray  vision.  He  can  visualize  the 
spatial  relationship  between  brain  structures  before  he  performs  a  craniotomy 
and  during  the  surgery  he  can  see  what’s  under  the  next  layer  before  he  cuts 
through.  Given  a  video  image  of  the  patient  and  a  three  dimensional  model  of 
the  patient’s  brain,  the  problem  enhanced  reality  visualization  faces  is  to  render 
the  model  from  the  correct  viewpoint  and  overlay  it  on  the  original  image.  The 
relationship  between  the  coordinate  freunes  of  the  patient,  the  patient’s  internal 
anatomy  scans  and  the  image  plane  of  the  camera  observing  the  patient  must  be 
established.  This  problem  is  closely  related  to  the  camera  calibration  problem. 
This  report  presents  a  new  approach  to  finding  this  relationship  and  develops  a 
system  for  performing  enhanced  reality  visualization  in  a  sirrgical  environment. 
Immediately  prior  to  surgery  a  few  circxilar  fiducials  are  placed  near  the  surgical 
site.  An  initial  registration  of  video  and  internal  data  is  performed  using  a 
laser  scanner.  Following  this,  o\rr  method  is  fully  automatic,  runs  in  nearly 
real-time,  is  accurate  to  within  a  pixel,  allows  both  patient  and  camera  motion, 
automatically  corrects  for  changes  to  the  internal  camera  parameters  (focal 
length,  focus,  aperture,  etc.)  and  requires  only  a  single  image. 
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Chapter  1 
Introduction 


LI  Computers  in  Medicine 

The  use  of  computers  in  medicine  has  increased  dramatically  over  the  last 
decade  [Bemmel  et  al.,  1985,  Reggia  and  Tuhnm,  1985,  Miller,  1990,  Chang, 
1993].  As  a  result,  nearly  all  aspects  of  medical  care  have  seen  improvement 
from  the  introduction  of  computer-based  tools.  These  tools  range  from  auto¬ 
mated  patient  record  keeping  to  three  dimensional  visualization  of  internal 
anatomy  and  from  computer  assisted  diagnosis  to  automatic  drug  interaction 
and  allergy  screening.  The  use  of  computers  to  assist  physicians  in  the  plan¬ 
ning  and  execution  of  surgical  procedures  is  also  growing  [Lemoine  et  al,  1991, 
Smith  et  al,  1991,  Pieper  et  al,  1992,  Verbeeck  et  al,  1993].  One  area  which 
could  benefit  greatly  from  more  sophisticated  computer-based  tools  is  neuro¬ 
surgery.  The  need  to  minimize  collateral  damage  while  removing  diseased 
tissue  requires  extreme  precision.  In  addition,  damage  to  certain  critical  brain 
regions,  such  as  the  motor  strip,  must  be  avoided  if  at  all  possible.  Planning 
a  surgical  approach  meeting  all  of  the  criteria  is  difficult  and  tedious.  Identi¬ 
fication  of  specific  brain  structru'es  and  modification  of  the  planned  approach 
are  often  difficult  dming  the  surgical  procedirre,  placing  additional  emphasis 
on  planning.  Traditionally,  precision  neurosurgery  requires  the  use  of  a  stereo¬ 
tactic  frame  which  is  rigidly  attached  to  the  patient  s  skull.  Figure  1-1  shows 
some  typical  stereotactic  frames.  The  frame  is  attached  prior  to  and  is  visible 
in  presurgical  imaging  such  as  magnetic  resonance  (MR)  or  computed  tomog¬ 
raphy  (CT)  imaging.  This  allows  siugical  plans  based  on  presurgical  internal 
anatomy  scans  to  be  transferred  to  the  patient  using  the  stereotactic  frame 
as  a  reference.  Frequently  the  patient  must  wear  the  frame  for  several  days 
between  imaging  and  srugery.  The  frames  are  a  significant  discomfort  to  the 
patient  and  are  cmnbersome  to  the  svugeon,  possibly  limiting  his  flexibility 
during  the  procedrue. 

A  system  which  improves  surgical  precision,  enables  identification  of  brain 
structures,  allows  modification  of  the  planned  approach  during  the  surgical 
procedure  and  does  not  require  the  use  of  a  stereotactic  frame  would  be  a  vast 
improvement  over  traditional  nevuosurgical  procedures. 
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1.2  What  is  Enhanced  Reality  Visualization? 

Computer  assisted  surgery  is  a  relatively  new  development  which  attempts 
to  provide  the  surgeon  with  a  tool  to  assist  in  the  planmng  and  execution  of 
surgical  procedures  [Adams  et  al.,  1990,  Lavallee  and  Cinquin,  19901.  Image 
guided  surgery  is  a  specific  type  of  computer  assisted  surgery  which  uses  ad¬ 
vanced  three  dimensional  visualization  techniques  to  provide  the  surgeon  with 
a  wealth  of  valuable  information  not  normally  available  in  the  operating  room 
[Pelizzari  et  al. ,  1991,  Wells  et  al. ,  1993,  Black  et  al. ,  1993,  Grimson  et  al. ,  19941. 
In  essence,  a  complex  surgical  procedure  can  be  navigated  visually  with  great 
precision  by  overlaying  on  an  image  of  the  patient  a  color  coded  preoperative 
plan  specif^ng  details  such  as  the  locations  of  incisions,  areas  to  be  avoided 
and  the  diseased  tissue.  The  process  of  aligning  the  preoperative  plans  with 
and  overlaying  them  on  the  patient  is  known  as  enhanced  reality  visualization. 
Enhanced  reality  visualization  is  the  process  of  enhancing  an  image  by  adding 
information  to  it.  The  information  added  can  be  anything  from  text  to  icons  or 
color  coding  to  three  dimensional  surfaces.  Figure  1-2  shows  an  enhanced  re¬ 
ality  visualization  of  a  patient  about  to  undergo  nexmosurgery.  The  area  shown 
in  ^  is  the  tumor  to  be  removed  and  the  ventricles  are  shown  in 


Figxire  1-2;  Enhanced  reality  visualization  showing  a  trnnor  and  ventricles. 
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1.3  A  Scenario  Which  Utilizes  Enhanced  Reality 
Visualization 

1.  A  patient  needing  neurosurgery  is  scanned  by  one  or  more  three  dimen¬ 
sional,  high  resolution,  internal  anatomy  scanners,  such  as  magnetic  res¬ 
onance  (MR)  or  computed  tomography  (CT). 

2.  Each  internal  anatomy  scan  is  segmented  by  tissue  type  (white  matter, 
gray  matter,  bone,  etc).  The  various  scans  of  the  patient  are  also  registered 
with  one  another.  The  result  is  a  three  dimensional  model  of  the  patient’s 
brain. 

3.  Tools  which  identify,  classify  and  label  brain  structures  such  as  motor 
strip  and  speech  centers  are  used  to  add  the  required  detail  to  the  model 
of  the  patient’s  brain. 

4.  Once  the  model  of  the  patient’s  brain  has  been  constructed,  enhanced  re¬ 
ality  visualization  is  possible.  Enhanced  reality  visualization  can  be  used 
to  help  plan  the  surgical  procedxire.  Live  video  of  the  patient  is  mixed 
with  information  generated  from  the  brain  model  allowing  the  surgeon  to 
view  the  internal  anatomy  of  the  patient  in  a  non-invasive  manner.  This 
capability  allows  the  surgeon  to  test  the  feasibility  of  various  possible  sur¬ 
gical  approaches  on  the  actual  patient.  The  enhanced  reality  visualization 
may  be  presented  to  the  surgeon  using  one  of  several  methods,  such  as  a 
head-motmted  display,  a  transparent  projection  screen  or  a  surgical  mi¬ 
croscope.  Details  regarding  the  surgical  approach  and  procedure  can  he 
added  to  the  brain  model. 

5.  The  surgical  procediire  is  performed  using  enhanced  reality  visualization. 
Enhanced  reality  enables  the  surgical  site  to  be  precisely  located  and 
facilitates  accurate  transfer  of  surgical  plans  to  the  patient. 

1.4  The  Problem  and  Our  Approach 

Enhanced  reality  visualization  is  an  integral  part  of  the  image  guided  surgery 
paradigm,  however  compared  with  other  aspects,  little  effort  has  been  expended 
on  this  area  [Adams  et  al,  1990,  Lavallee  and  Cinquin,  1990,  Pelizzari  et  al, 
1991,  Wells  et  al.,  1993,  Grimson  et  al.,  1994].  In  order  to  produce  enhanced 
reality  visualizations  we  must  be  able  to  quickly  and  accurately  align  informa¬ 
tion  such  as  a  brain  model  with  an  image.  There  are  several  other  issues  which 
must  be  addressed  before  a  full  function  enhanced  reality  visualization  system 
can  be  created.  Some  of  these  challenges  are  listed  below. 
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Display  method  The  displays  currently  available  for  enhanced  reality  visu¬ 
alization  are  less  than  optimal.  Head  mounted  displays  are  still  heavy, 
awkward  and  have  relatively  low  resolution.  Conventional  CRT’s  have 
better  resolution  but  limit  the  applications  of  enhanced  reality  visualiza¬ 
tion. 

Rendering  The  complexity  of  information  and  the  detail  and  realism  with 
which  it  can  be  rendered  while  updating  at  a  reasonable  frame  rate  are 
limited. 

Information  acquisition  Acquiring  information  and  converting  it  to  a  form 
suitable  for  use  in  enhanced  reality  visualization  can  be  a  difficult  and 
time  consuming  process. 

Virtual  reality  faces  many  of  the  same  issues  as  enhanced  reality  visual¬ 
ization.  There  already  exists  a  significant  research  effort  in  virtual  reality 
examining  these  problems.  While  there  are  many  similarities,  enhanced  real¬ 
ity  differs  fundamentally  from  virtual  reality  in  that  it  is  anchored  in  the  real 
world.  Enhanced  reality  visualization  must  align  the  enhancement  with  a  real 
image  quickly  and  accurately.  The  ability  to  perform  this  alignment  quickly 
and  accurately  is  fundamental  to  enhanced  reality  visualization  and  will  be  the 
focus  of  this  report.  Given  a  video  image  and  a  three  dimensional  model,  the 
problem  is  to  render  the  model  from  the  correct  viewpoint  and  overlay  it  on  the 
original  image.  The  relationship  between  the  coordinate  frames  of  the  world, 
the  model  and  the  image  plane  of  the  camera  must  be  established.  This  prob¬ 
lem  is  closely  related  to  the  camera  calibration  problem.  Stated  more  precisely 
the  problem  is  to; 

Determine  the  perspective  transformation  which  maps  model  coordi¬ 
nates  to  image  coordinates  in  “real-time”,  allowing  the  information 
from  the  model  to  he  added  to  an  image  in  the  correct  location. 

An  overview  of  the  problem  is  shown  in  Figure  1-3.  We  solve  for  the  transforma¬ 
tion  which  maps  model  coordinates  to  image  coordinates  directly.  An  alternate 
approach  solves  for  several  transformations  and  then  composes  them  into  a 
single  transformation  from  model  to  image  coordinates.  For  example,  a  refer¬ 
ence  coordinate  system  could  be  defined  for  the  physical  object(s).  The  first  step 
might  be  to  find  the  transformation  which  aligns  the  model  with  the  reference 
coordinate  system.  Next,  the  transformation  from  reference  coordinates  to  im¬ 
age  coordinates  must  be  determined.  Solving  for  intermediate  transformations 
can  introduce  error  into  the  solution  and  is  computationally  more  expensive. 
Unless  this  data  is  needed  there  is  no  reason  to  break  the  problem  into  several 
pieces. 
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Enhanced  Reality 
Visualization  with 
Hidden  Lines  Added 


Figure  1-3:  Overview  of  this  report. 

We  will  define  several  terms  that  will  be  used  throughout  this  report.  An 
enhanced  reality  visualization  is  composed  of  a  virtual  image  overlayed  on  a  raw 
image.  A  raw  image  is  an  image  of  the  real  world  prior  to  any  enhancement. 
The  coordinates  of  the  raw  image  are  referred  to  as  image  coordinates.  The 
information  which  will  be  added  to  the  raw  image  to  produce  the  enhancement  is 
referred  to  as  the  model.  The  coordinates  of  the  model  are  referred  to  as  model 
coordinates.  A  virtual  image  is  generated  by  rendering  the  model  from  a 
particular  view  point.  The  view  point  captures  the  relative  placement  and 
orientation  between  model  and  viewer.  Finally,  the  term  world  coordinates 
is  used  to  refer  to  an  arbitrary  coordinate  system  attached  to  an  object  visible 
in  the  raw  image. 

Figure  1-4  shows  an  overview  of  our  method  in  the  context  of  neurosurgery. 
A  novel  formulation  of  the  camera  calibration  problem  allows  us  to  quickly 
and  easily  obtain  the  perspective  transformation  mapping  model  (MR  or  CT) 
coordinates  to  image  coordinates.  The  perspective  transformation  is  then  used 
to  generate  the  enhanced  reality  visualization.  Our  approach  utilizes  several 
circular  fiducials  placed  near  the  surgical  site. 

The  circular  fiducials  enable  us  to  recover  a  scale  factor  at  each  fiducial  (the 
focal  length  of  the  camera  divided  by  the  depth  of  the  fiducial).  Given  the  scale 
factor  as  well  as  the  image  and  model  (MR  or  CT)  coordinates  for  each  fiducial, 


1.4.  THE  PROBLEM  AND  OUR  APPROACH 
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Enhanced  Reality 
Visualization  of 
the  Patient  with 
the  Brain  Added 

Figtire  1-4:  Overview  of  this  report  showing  the  circular  fiducials. 


Model  of  Patient’s 
Brain  and  Fiducials 


Figure  1-5:  Determining  the  model  (MR)  coordinates  of  the  fiducials. 
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the  problem  of  determining  the  perspective  transformation  which  maps  model 
coordinates  to  image  coordinates  reduces  to  three  sets  of  linear  equations.  In 
general,  the  model  coordinates  of  the  fiducials  are  not  known  a  priori,  and  they 
must  be  calibrated.  Figure  1-5  depicts  the  process  of  determining  the  model 
(MR  or  CT)  coordinates  of  the  fiducials.  Dming  an  initial  calibration  phase  a 
laser  scanner  is  used  to  register  the  world  coordinates  of  the  fiducials  with  the 
model  coordinates  of  the  MR  or  CT  data.  Once  the  initial  calibration  is  complete, 
oim  method  is  fully  automatic  and  uses  visual  information  exclusively.  Some  of 
the  additional  characteristics  of  our  approach  which  make  it  particularly  well 
suited  to  enhanced  reality  visualization  are: 

1.  Requires  only  a  single  image  with  a  few  fiducials 

2.  Does  not  require  internal  calibration  or  known  focal  length 

3.  Accurate  to  within  a  pixel 

4.  Solution  is  non-iterative 

The  remaining  chapters  of  this  report  are  organized  as  follows:  Chapter  2 
reviews  current  enhanced  reality  visualization  techniques.  Chapter  3  devel¬ 
ops  a  basic  camera  model  and  contains  a  brief  discussion  of  current  camera 
calibration  techniques.  Chapter  4  presents  our  method  and  an  overview  of  its 
implementation.  Chapter  5  provides  the  details  (theory,  error  analysis  and 
empirical  results)  associated  with  circular  fiducials.  Chapter  6  discusses  one 
method  of  calibrating  fiducials.  Chapter  7  shows  the  resxilts  of  our  method  for 
a  test  object  and  a  plastic  skull.  Finally,  Chapter  8  presents  our  conclusions. 


Chapter  2 
Related  Work 


Several  groups  of  researchers  have  recently  been  investigating  enhanced  real¬ 
ity.  The  proposed  applications  for  enhanced  reality  range  from  laser  printer 
repair  to  aircraft  manufacture.  Current  research  efforts  in  enhanced  reality 
visualization  differ  in  many  implementation  details.  The  one  thing  they  all 
have  in  common  is  the  requirement  to  align  a  model  with  an  image  of  the  real 
world.  In  this  chapter  we  will  examine  several  different  approaches. 


2.1  Medical  Applications 

2. L 1  Aachen  University  of  Technology 

A  group  at  the  Aachen  University  of  Technology  in  Germany  has  developed 
a  “Computer  Assisted  Surgeiy”  module  for  use  in  ENT  surgical  procedures 
[Adams  et  al,  19901.  A  model  of  the  patient  is  produced  from  presurgical  CT 
scans.  Radiopaque  markings  are  attached  to  the  patient’s  skull  prior  to  the 
presTirgical  scans  for  use  as  reference  points.  The  system  is  calibrated  using 
a  hand-gxiided  electro-mechanical  three  dimensional  coordinate  di^tizer.  The 
digitizer  is  used  to  measure  the  positions  of  several  reference  points.  With 
correspondence  between  digitizer  and  CT  points  the  transformation  from  the 
CT  coordinates  to  digitizer  coordinates  can  be  calculated  using  3D/3D  matclung. 
Dixring  an  operation  the  surgeon  can  use  the  digitizer  to  point  at  an  umdentified 
structure.  Three  perpendicular  views  of  the  CT  data  corresponding  to  the 
location  of  the  digitizer  are  then  displayed  on  a  nearby  CRT.  The  system  must 
be  recalibrated  every  time  the  patient  moves  with  respect  to  the  digitizer.  The 
reported  accuracy  is  better  than  ±lmm. 


2.L2  TIME 

A  group  at  TIME  in  Grenoble,  France  has  developed  a  “Computer  Assisted 
Medical  Intervention”  module  [Lavallee  and  Cinquin,  19901.  A  model  of  the 
patient’s  internal  anatomy  is  produced  from  presurgical  imaging.  This  system 


21 


22 


CHAPTER  2.  RELATED  WORK 


uses  a  surgical  robot  or  guidance  system  with  modes  that  range  from  passive 
to  semi-autonomous.  In  passive  mode  the  robot  provides  visual  feedback  by 
overlaying  the  position  of  an  instrumented  probe  on  the  presurgical  scans.  In 
semi-autonomous  mode  some  portions  of  the  surgical  procedirre  are  performed 
by  the  robot  under  the  supervision  of  the  surgeon.  The  system  is  calibrated  us¬ 
ing  a  special  calibration  cage  made  of  four  Plexiglas  planes  containing  metallic 
balls.  The  calibration  cage  is  placed  near  the  patient  and  a  pair  of  X-ray  im¬ 
ages  are  taken.  The  relationship  between  the  presurgical  imaging,  the  X-ray 
device  and  the  surgical  robot  can  be  established  using  3D/3D  matching.  Again 
if  the  patient  moves  relative  to  the  robot  (or  the  X-ray  device)  the  system  must 
be  recalibrated.  Accuracy  for  the  instrumented  probe  is  reported  as  ~5mm. 
Accuracies  for  other  modes  are  not  reported. 

2.1.3  University  of  Chicago 

A  group  at  the  University  of  Chicago  has  developed  a  method  for  “Interactive 
3D  Patient  -  Image  Registration”  [Pelizzari  et  al,  1991].  The  method  is  used  to 
position  patients  for  radiation  therapy.  Again,  a  model  of  the  patient’s  internal 
anatomy  is  produced  from  presiargical  imaging.  The  model  is  used  to  plan  the 
geometry  of  radiation  therapy  beams.  Because  of  the  non-invasive  nature  of 
radiation  therapy  it  is  difficult  to  transfer  the  beam  geometry  plarmed  using  the 
model  to  the  actual  patient.  A  Polhemus  3Space  tracker  and  localizer  are  used 
as  a  magnetic  3D  digitizer  to  measure  the  surface  of  the  patient.  The  model  and 
the  measured  surface  are  then  registered  using  3D/3D  surface  fitting.  Once  the 
registration  has  been  performed,  the  magnetic  digitizer  is  used  again  to  mark 
the  patient  for  setup.  The  intersections  of  the  three  principle  planes  with  the 
patient  are  traced.  These  marks  are  then  used  as  reference  for  positioning 
the  therapy  machine.  The  therapy  machine  must  be  aligned  manually.  If 
the  patient  moves  the  entire  calibration  need  not  be  reperformed,  however  the 
therapy  machine  must  be  realigned  with  the  reference  marks  on  the  patient. 
Accuracies  of  ~lmm  and  ~1°  are  reported. 

2.L4  Massachusetts  Institute  of  Technology 

A  group  at  MIT’s  Artificial  Intelligence  Laboratory  has  developed  “An  Auto¬ 
matic  Registration  Method  for  Frameless  Stereotaxy,  Image  Guided  Surgery, 
and  Enhanced  Reality  Visualization”  iGrimson  et  al.,  1994].  As  in  the  previous 
work  described,  a  model  of  the  patient’s  internal  anatomy  is  produced  from 
presurgical  imaging.  A  Technical  Arts  laser  range  scanner  is  used  to  collect  a 
set  of  3D  data  points  from  the  patient’s  skin  surface.  The  model  and  the  laser 
data  are  registered  using  3D/3D  surface  matching.  A  special  calibration  object 
IS  used  to  calibrate  the  laser  scanner  and  calibrate  the  location  of  a  camera  on 
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the  laser  scanner.  Once  the  model  is  registered  with  the  laser  data  and  the 
location  of  the  laser  scanner  camera  is  calibrated,  the  model  can  be  overlayed 
with  video  of  the  patient.  The  calibration  must  be  reperformed  if  the  patient 
or  camera  move  and  the  overlay  can  only  be  generated  for  a  single  viewpoint. 
The  reported  accuracy  of  this  method  is  ~lmm. 


2.L5  Stanford 

A  group  at  Stanford  University  has  developed  “Treatment  Planning  for  a  Ra- 
diosurgical  System  with  General  Kinematics”  ISchweikard  et  al,  1994].  The 
method  is  used  to  plan  and  perform  radiosurgery.  A  model  of  the  patient’s 
internal  anatomy  is  produced  from  presxirgical  imaging.  The  radiosurgery  is 
planned  using  the  model.  In  addition,  the  model  is  used  to  synthesize  ra¬ 
diographs  from  different  view  points.  A  total  of  over  400  such  radiographs 
are  produced.  During  the  radiosiargery,  two  nearly  orthogonal  X-ray  images 
of  the  patient  are  taken  and  compared  with  the  precomputed  radiographs  to 
determine  the  patient’s  position  and  orientation  with  respect  to  the  treatment 
machine.  The  patient’s  position  and  orientation  can  be  determined  about  twice 
per  second.  The  treatment  machine  (X-ray  system,  radiation  source,  etc.)  must 
be  calibrated  separately  using  a  special  calibration  routine.  The  accuracy  of 
this  method  is  not  reported. 


2.L6  University  of  North  Carolina 

A  group  at  the  University  of  North  Carolina  has  developed  a  method  for  “Merg¬ 
ing  Virtual  Objects  with  the  Real  World”  [Bajura  et  al,  1992].  This  system 
allows  the  user  to  see  ultrasoimd  imagery  overlaid  on  a  patient  in  near  real¬ 
time.  A  six  degrees  of  freedom  (DOF)  Polhemus  SSpace  tracker  is  mounted 
on  the  probe  used  to  acquire  ultrasound  images.  A  second  6DOF  tracker  is 
attached  to  the  head-movmted  display  (HMD)  used  to  view  the  overlay.  Images 
from  a  camera  also  mounted  on  the  HMD  are  combined  with  the  ultrasound  im¬ 
ages  to  produce  the  overlay.  Since  both  trackers  report  position  and  orientation 
it  is  a  simple  matter  to  transform  between  ultrasoimd  tracker  coordinates  and 
HMD  tracker  coordinates.  In  order  to  transform  ultrasoimd  images  into  the 
coordinate  system  of  the  HMD  camera  the  relationships  between  ultrasoimd 
images  and  the  ultrasound  tracker  and  the  HMD  camera  and  the  HMD  tracker 
must  be  established.  A  special  calibration  jig  is  used  to  determine  these  trans¬ 
formations  periodically.  This  system  allows  for  motion  of  both  the  user  and  the 
patient.  The  accuracy  of  the  overlay  is  not  reported. 
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2.2  Other  Applications 

2.2.1  Boeing 

A  group  at  Boeing  has  developed  “An  Application  of  Heads-Up  Display  Tech¬ 
nology  to  Manual  Manufacturing  Processes”  [Caudell  and  Mizell,  19921.  The 
goal  of  this  work  is  to  overlay  manufacturing  instructions  on  images  of  the 
manufactming  process  and  display  them  to  a  worker  using  an  HMD.  The  in¬ 
structions  to  be  overlayed  are  derived  from  a  CAD  model.  The  system  uses  a 
6DOF  Polhemus  3D  Isotrack  magnetic  tracking  system  attached  to  the  HMD 
to  generate  the  overlay.  A  calibration  jig  is  used  to  establish  the  relationship 
between  the  HMD  and  the  work  site.  Given  the  relationship  between  the  HMD 
and  the  work  site  an  overlay  can  easily  be  generated.  The  accimacy  of  this 
system  is  not  reported. 


2.2.2  Columbia  University 

A  group  at  Columbia  University  has  developed  a  method  for  “Knowledge-Based 
Augmented  Reality”  [Feiner  et  al.,  1993].  The  goal  of  this  work  is  to  overlay 
instructions  for  repairing  a  laser  printer  with  images  of  the  laser  printer.  The 
instructions  are  derived  from  a  knowledge-based  system.  A  Logitech  3D  ultra¬ 
sonic  tracking  system  and  an  Ascension  Technology  magnetic  tracking  system 
are  used  to  determine  the  position  and  orientation  of  an  HMD  and  several  key 
parts  of  the  laser  printer.  Using  the  position  and  orientation  information  from 
the  tracking  system  an  instruction  overlay  is  generated  and  displayed  to  the 
user  via  the  HMD.  Frame  rates  of  about  15hz  are  reported.  Accuracy  is  not 
reported. 


2.3  Discussion 

As  discussed  in  Section  1.4  the  transformation  which  maps  model  coordinates 
to  image  coordinates  can  be  divided  into  several  pieces.  All  of  the  methods 
discussed  above  take  this  approach  and  all  of  them  calculate  a  transformation 
which  registers  the  model  with  some  world  (reference)  coordinate  system.  Ini¬ 
tially,  oxrr  discussion  will  consider  only  this  piece  of  the  larger  transformation. 

Current  methods  of  registering  the  model  with  the  world  coordinate  system 
are  somewhat  limited.  Many  of  the  approaches  use  magnetic  trackers  to  deter¬ 
mine  the  transformation  which  will  align  the  two  coordinate  frames.  Magnetic 
trackers  have  several  significant  shortcomings  which  limit  their  effectiveness 
for  enhanced  reality  visualization.  Magnetic  trackers  have  a  very  short  range, 
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typically  a  few  feet.  Accuracies  are  limited  to  ~6mm  and  Perhaps  the 

most  significant  limitation  is  that  magnetic  and  metallic  objects  can  severely 
degrade  the  performance  of  magnetic  trackers.  This  is  a  sigmficant  limitation 
for  surgical  applications  as  most  operating  rooms  are  loaded  with  metallic  and 
magnetic  objects.  In  addition,  current  advances  in  intra-operative  imaging 
have  made  it  possible  to  obtain  MR  images  of  a  patient’s  internal  anatomy 
during  sirrgery.  This  environment  precludes  the  use  of  magnetic  trackers. 

Several  other  t5T)es  of  sensors  are  also  used  to  determine  the  transformation 
which  will  register  the  model  with  the  world  coordinate  system.  These  sensors 
also  have  limitations  which  make  them  less  than  desirable  for  enhanced  reality 
visualization.  Ultrasonic  trackers  have  range  and  accuracy  limitations  similar 
to  those  of  magnetic  trackers.  While  they  are  not  susceptible  to  magnetic 
interference  they  are  limited  to  line  of  sight  operation.  Mechanical  digitizers 
have  good  accuracy  but  are  cumbersome  and  have  limited  range.  The  laser 
scanner  provides  very  accurate  position  information  but  also  has  a  short  range 
and  is  limited  to  line  of  sight  operation. 

All  of  the  methods  cited  above  use  active  sensors  to  collect  the  data  required 
to  register  the  model  with  the  world  coordinate  system.  This  reqmres  either  a 
special  environment  (mechanical  digitizers,  magnetic  trackers  and  ultrasonic 
trackers)  or  a  cumbersome  piece  of  equipment  (laser  scanner  and  X-ray).  There 
are  also  many  situations  where  active  emissions  are  not  desirable. 

The  registration  produced  by  most  of  the  medical  applications  (the  excep¬ 
tions  are  the  work  being  done  by  the  groups  at  the  University  of  North  Carolina 
and  Stanford  University),  is  limited  to  a  fixed  patient  and  sensor  configuration. 
They  do  not  allow  for  patient  motion.  This  is  because  the  alignment  between 
the  patient  and  the  world  coordinate  system  is  implicitly  determined  during  the 
initial  calibration  phase  and  is  not  monitored.  It  is  not  clear  that  it  is  possible 
to  extend  these  methods  to  allow  for  motion.  [Crimson  et  al. ,  1994]  claim  that 
their  method  is  extensible  to  cover  patient  motion,  however  it  is  not  clear  that  it 
is  practical  or  possible  to  do  so  using  a  laser  scanner.  In  a  surgical  environment, 
with  all  but  the  surgical  site  hidden  under  sterile  drapes,  it  is  doubtful  that 
enough  patient  surface  area  will  be  visible  to  the  scanner  to  allow  registration 
of  the  patient  and  model.  In  addition,  using  a  laser  in  the  operating  room  might 
be  distracting  to  the  surgeons. 

These  methods  also  require  a  high  degree  of  operator  action  to  register  the 
model  with  the  patient.  T3q)ically  the  operator  must  measure  data  points  by 
hand  or  edit  data  that  was  semi-automatically  collected.  At  least  one  of  the 
methods  requires  about  half  an  hour  to  produce  a  single  registration. 

While  all  of  the  medical  applications  register  a  model  obtained  from  presur- 
gical  imaging  to  a  world  coordinate  system,  only  two  of  the  applications  (Mas¬ 
sachusetts  Institute  of  Technology  and  University  of  North  Carolina)  actually 


^These  accuracies  are  for  a  sensor  located  between  10  and  70cm  from  the  source. 
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produce  enhanced  reality  visualizations.  Neither  of  these  methods  can  handle 
dynamic  changes  in  focus,  zoom  or  apertime  of  the  camera  used  to  obtain  the 
raw  image  for  the  enhanced  reality  visualization.  And  only  the  method  pro¬ 
duced  by  the  group  at  the  University  of  North  Carolina  allows  the  surgeon  to 
change  the  viewpoint  of  the  enhanced  reality  visualization. 

None  of  the  current  approaches  to  enhanced  reality  visualization  are  partic¬ 
ularly  well  suited  to  a  surgical  environment.  The  sensors  used  are  active  and 
are  frequently  cumbersome.  A  good  solution  for  a  surgical  environment  should 
allow  for  patient  motion  and  should  allow  the  surgeon  to  see  enhanced  reality 
visualizations  from  different  view  points.  It  should  be  fully  automatic  follow¬ 
ing  a  straight  forward  initial  calibration.  And  the  method  should  allow  for 
dynamic  changes  in  the  camera  parameters.  We  will  consider  optical  sensors 
for  a  number  of  reasons.  Small,  light  weight  and  inexpensive  video  cameras 
are  readily  available.  Very  accurate  results  can  be  achieved  with  these  cameras 
which  can  operate  over  long  ranges.  Fiirther,  we  will  obtain  the  information 
required  to  register  the  model  with  the  raw  image  entirely  from  the  raw  image. 
This  has  the  advantage  of  ensiiring  that  registration  information  is  available 
exactly  when  raw  images  are  available  to  produce  an  enhanced  reality  visual¬ 
ization.  Since  the  goal  of  enhanced  reality  visualization  is  to  enhance  an  image, 
the  raw  image  will  likely  contain  a  lot  of  information  valuable  to  performing 
the  enhancement.  Almost  all  of  the  current  approaches  ignore  the  information 
contained  in  the  raw  image  opting  for  what  is  essentially  a  closed  loop  solution. 

Recently  [Wells  et  al. ,  1993]  proposed  a  method  of  enhanced  reality  visualiza¬ 
tion  using  video  information  exclusively,  however  the  method  requires  manual 
alignment  of  the  model  with  the  video  image  and  in  some  cases  requires  mark¬ 
ers  to  appear  in  both  the  MR  image  and  video  image.  An  optical  tracker  has  also 
been  proposed  by  a  group  at  the  University  of  North  Carolina  [Wang  et  al. ,  1990, 
Ward  et  al. ,  1992,  Gottschalk  and  Hughes,  1993,  Azvuna  and  Biship,  1994].  This 
method  is  essentially  another  active  sensor  not  tmlike  a  laser  scanner.  It  uses 
a  “sea-of-lights”  consisting  of  nearly  1000  LED’s  mounted  in  the  ceiling  tiles 
of  10’  by  12’  room.  Three  cameras  mounted  on  an  HMD  are  aimed  at  the  ceil¬ 
ing  while  the  LED’s  are  flashed  sequentially.  This  “optical  tracker”  requires  a 
special  ceiling  which  must  be  calibrated  and  a  signiflcant  amount  of  additional 
hardware  (3  extra  cameras,  LED  control,  etc).  Neither  of  these,  proposals  are 
suitable  in  our  application  for  the  reasons  cited  above. 


Chapter  3 

Camera  Calibration 


3.1  Camera  Model 

The  pin-hole  camera  is  frequently  used  to  model  the  transformation  from  world 
coordinates  to  image  plane.  The  pin-hole  model  uses  the  perspective  projection 
model  of  image  formation.  Orthographic  projection  with  scale  or  weak  perspec¬ 
tive  is  used  in  many  computer  vision  applications,  however  it  is  not  accurate 
enough  across  the  entire  image  for  our  application.  For  example  consider  an  ob¬ 
ject  with  5cm  of  depth  located  100cm  away  from  the  camera  and  5cm  off-axis, 
see  Figure  3-1.  Under  orthographic  projection  points  a  and  h  are  collocated, 
however  in  a  real  image  (using  a  25mm  lens)  the  points  are  5  pixels  apart.  The 
effect  is  significant  even  with  the  object  only  slightly  off  center.  The  left  side 
of  Figure  3-2  shows  a  camera  centered  Cartesian  coordinate  system.  The  optic 
axis  of  the  camera  is  coincident  with  the  z  axis.  The  image  plane  is  parallel  to 
the  xy  plane  and  located  a  distance  /  from  the  origin.  Even  though  the  image 
plane  is  not  required  to  be  parallel  to  the  xy  plane,  most  camera  models  do  not 
explicitly  consider  this  possibility.  Image  plane  pitch  0x  and  tilt  dy  are  usually 
reflected  in  pose.  We  will  start  with  the  assumption  that  0^  =  9y  =  0  and  then 
in  Chapter  4  we  will  show  how  our  method  implicitly  models  image  plane  pitch 
and  tilt.  The  point  where  the  image  plane  and  the  optic  axis  intersect  is  known 
as  the  principal  point.  Under  perspective  projection  a  point  Pc  =  [xc  yc  Zc] 
projects  to  point  p  =  [x  y]  on  the  image  plane  by  the  following  equations^: 

®  (3.1) 

y  =  (3.2) 

Unfortunately,  we  are  not  able  to  directly  access  the  image  plane  coordinates. 
Instead  we  have  access  to  an  array  of  pixels  in  a  frame  buffer  or  computer 

^We  represent  points  as  row  vectors  rather  than  column  vectors.  This  means  that  points 
will  be  premultiplied  instead  of  postmultiplied  when  applying  a  transformation.  This  is  the 
exact  opposite  of  what  is  t5rpically  used  in  computer  vision,  however  it  is  the  notation  that  the 
author  was  first  exposed  to  and  what  has  stuck. 
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Figure  3-1:  Effect  of  orthographic  projection  assumption. 


Figure  3-2:  Transformation  from  world  coordinates  to  the  image  plane  of  an 
ideal  pin-hole  camera. 
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memory.  In  order  to  understand  the  relationship  between  image  plane  coor¬ 
dinates  and  the  array  of  pixels,  we  must  examine  the  imaging  process.  The 
image  plane  of  a  CCD  camera  is  a  rectangular  array  of  discrete  light  sensitive 
elements.  The  output  of  each  of  these  elements  is  proportional  to  the  amount 
of  light  which  falls  on  it.  The  values  for  each  of  these  elements  are  read  out  one 
element  at  a  time  row  after  row  imtil  the  entire  sensor  array  has  been  read. 
This  analog  signal  is  processed  by  a  frame  grabber  which  converts  the  camera’s 
output  to  digital  values  and  stores  them  in  the  frame  buffer.  The  result  is  an 
array  of  digital  values  in  the  memory  of  the  computer.  The  rows  of  the  array 
correspond  to  the  rows  of  the  image  sensor.  In  general,  this  is  not  the  case 
for  the  columns.  A  s3mchronization  signal  is  provided  between  rows,  however 
the  frame  grabber  samples  the  signal  within  a  row  asynchronously  and  at  an 
independent  frequency  which  may  result  in  a  different  number  of  colrunns  per 
row  than  were  present  in  the  camera.  Synchronization  errors  can  also  cause 
the  rows  to  not  line  up.  This  is  known  as  pixel  jitter  and  in  extreme  cases  can 
cause  the  x  and  y  axes  to  appear  non-orthogonal  or  skewed  [Lenz  and  Tsai, 
1988].  Most  camera  models  omit  skew  angle  6^  (the  angle  between  the  x  and  y 
axes  minus  90°).  We  will  start  with  the  assumption  that  ^xy  =  0  for  simplicity 
and  then  in  Chapter  4  we  will  show  how  our  method  implicitly  models  skew 
angle.  A  single  element  of  the  array  in  memory  is  commonly  called  a  pixeP. 
We  will  refer  to  the  row  and  column  number  of  a  given  pixel  as  y'  and  x'  respec¬ 
tively.  Several  parameters  are  defined  to  quantify  the  relationship  between 
the  array  in  memory  and  the  coordinate  system  of  the  image  plane,  xq  and  yo 
are  the  pixel  coordinates  of  the  principal  point,  and  Sy  are  the  number  of 
pixels  in  memory  per  unit  distance  in  the  x  and  y  direction  of  the  image  plane. 
These  parameters  along  with  /,  fy  and  0xy  are  intrinsic  or  internal  camera 
calibration  parameters.  The  projection  of  point  Pc  to  point  p'  =  [x'  y']  in  memory 
is  described  by  the  following  equations: 

x'  —  xo  =  /-Sx— -  (3.3) 

•Zc 

y'  -yo  =  fsy—  (3.4) 

Zc 

The  right  half  of  Figure  3-2  shows  an  arbitrary  Cartesian  coordinate  system 
which  we  will  refer  to  as  the  world  coordinate  system.  A  point  Pw  in  the  world 
coordinate  system  is  transformed  into  the  camera  centered  coordinate  system 

^As  noted  above  pixels  in  memory  can  differ  in  size  from  the  underlying  image  sensing 
elements.  Many  of  the  measures  of  accuracy  cited  both  in  this  work  and  in  the  hterattire 
should  actually  be  made  relative  to  the  image  sensing  elements.  This  is  straight  forward  for  a 
cahbrated  camera.  We  are  working  with  uncedibrated  cameras  and  the  relationship  between 
image  sensing  elements  and  pixels  in  memory  is  frequently  not  known.  Thus  for  simplicity  and 
in  spite  of  the  preceding,  we  will  express  our  measures  in  terms  of  pixels  in  memory. 
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by  the  following  equation: 

P^^P^n  +  T  (3.5) 

where  %  is  an  orthonormal  rotation  matrix  and  T  is  a  translation  vector.  Tl 
and  T  together  are  commonly  referred  to  as  the  extrinsic  or  external  camera 
parameters. 

The  pin-hole  model  assumes  an  ideal  camera.  Because  of  lens  distortion, 
real  cameras  deviate  from  ideal.  The  major  categories  of  lens  distortion  are: 

1.  Radial  distortion  -  the  path  of  a  light  ray  traveling  from  the  object  to  the 
image  plane  through  the  lens  is  not  always  a  straight  line. 

2.  Decentering  distortion  -  the  optic  axis  of  individual  lens  components  are 
not  always  collinear. 

3.  Thin  prism  distortion  -  the  optic  axis  of  the  lens  assembly  is  not  always 
perpendicular  to  the  image  plane. 

When  it  is  necessary  to  explicitly  model  lens  distortion  it  is  typically  sufficient 
to  model  only  radial  distortion.  Our  method,  developed  in  Chapter  4,  implic¬ 
itly  models  a  linear  approximation  to  lens  distortion.^  Using  a  25mm  lens  of 
average  quality  we  have  not  found  it  necessary  to  explicitly  model  lens  distor¬ 
tion.  The  maximum  radial  distortion  at  the  extreme  edge  of  the  image  for  our 
configuration  is  just  a  few  pixels. 


3.2  Current  Camera  Calibration  Techniques 

Research  into  the  camera  calibration  problem  has  a  long  history  originating  in 
the  field  of  photogrammetry.  For  a  more  complete  discussion  of  camera  calibra¬ 
tion  techniques  see  [Slama,  1980,  Tsai,  1987].  Camera  calibration  techniques 
can  be  divided  into  three  different  categories: 

1.  Methods  which  recover  only  intrinsic  parameters.  These  methods  gen¬ 
erally  require  a  special  calibration  object  or  stand  to  allow  the  internal 
parameter(s)  to  be  measm-ed  independent  of  other  parameters.  Also, 
these  methods  assmne  that  the  intrinsic  parameters  do  not  change.  Un¬ 
less  extreme  care  is  taken  to  ensure  otherwise,  it  is  almost  certain  that 
the  intrinsic  parameters  will  change  perhaps  by  a  significant  amount  fol¬ 
lowing  calibration.  Examples  of  these  methods  include  [Brown,  1965, 
Lenz  and  Tsai,  1988,  Maybank  and  Faugeras,  19921. 


discussion  of  this  characteristic  can  be  found  in  Appendix  A. 
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2.  Methods  which  recover  only  extrinsic  parameters  (also  known  as  geomet¬ 
ric  methods).  These  methods  assume  that  the  intrinsic  camera  param¬ 
eters  are  precisely  known.  This  is  not  always  possible  for  the  reasons 
mentioned  above.  Examples  of  these  methods  include  [Church,  1945, 
Fischler  and  Bolles,  1981]. 

3.  Methods  which  recover  both  intrinsic  and  extrinsic  parameters.  These 
methods  can  be  further  divided  into  two  categories: 

(a)  Nonlinear  optimization  methods.  These  methods  are  both  nonlinear 
and  iterative.  These  methods  typically  produce  the  most  accxmate  re¬ 
sults  but  require  a  large  number  of  features  and  a  sigmficant  amount 
of  time.  Further  they  are  frequently  not  automatic  and  need  a  good 
initial  solution  to  ensvure  convergence.  Finding  an  initial  solution 
can  be  a  difficult  problem.  Examples  of  these  methods  include  iFaig, 
1975,  Sobel,  19741. 

(b)  Linearization  methods.  These  methods  linearize  the  nonlinear  pro¬ 
jection  equations  by  introducing  additional  constraints.  The  basic 
difference  between  members  of  this  category  is  how  the  problem  is 
linearized.  If  care  is  not  taken  when  the  equations  are  linearized 
significant  bias  can  be  introduced.  Many  of  these  methods  are  it¬ 
erative  and  under  certain  conditions  fail  to  converge.  These  meth¬ 
ods  tend  to  be  significantly  faster  than  the  nonlinear  optimization 
methods.  They  frequently  do  not  require  an  initial  solution  or  can 
calculate  one  with  relative  ease.  These  methods  still  require  a  large 
number  of  points  for  good  results.  Examples  of  these  methods  in¬ 
clude  [Faugeras  and  Toscani,  1987,  Grosky  and  Tambunno,  1987, 
Tsai,  1987,  Goshtasby,  1987,  Ganapathy,  19841. 

None  of  the  current  solutions  to  the  camera  calibration  problem  are  ide¬ 
ally  suited  to  enhanced  reality  visualization.  The  linearization  methods  come 
closest  to  meeting  the  requirements  of  enhanced  reality  visualization,  however 
there  is  room  for  improvement. 


Chapter  4 
Our  Solution 


We  have  developed  a  novel  method  for  determining  the  relationship  between 
model  and  image  coordinates.  Figure  4-1  provides  an  overview  of  our  method. 
Two  key  insights  lead  to  a  significant  simplification  of  the  problem.  These 
insights  are: 

1.  It  is  not  necessary  to  separate  the  intrinsic  and  extrinsic  parameters  for 
enhanced  reality  visualization. 

2.  It  is  possible  to  recover  depth  information  from  a  single  2D  image. 

Utilizing  these  insights  produces  a  solution  that  is  particularly  well  suited 
to  enhanced  reality  visualization  and  meets  the  requirements  discussed  in 
Chapters  1  and  2.  Our  solution  is  most  closely  related  to  the  linearization 
methods  described  in  Chapter  3. 


Enhanced  Reality 
Visualization  of 
the  Patient  with 
the  Brain  Added 


Figure  4-1:  Overview  of  this  report  showing  the  circular  fiducials. 
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4.1  A  Perspective  Transformation  is  Enough 

The  camera  calibration  problem  is  typically  posed  as  follows: 


where: 
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Af  is  a  matrix  of  model  points  of  the  form  [X  r  Z  1] ,  /  is  a  matrix  of  image  points 
of  the  form  [x  y  z]  resulting  from  the  projection  of  M  onto  the  image  plane,  X  is 
an  external  camera  calibration  matrix,  TZ  is  an  orthonormal  rotation  matrix,  T 
is  a  translation  vector,  and  C  is  an  internal  camera  calibration  matrix.  Image 
points  are  expressed  in  homogeneous  coordinates  to  allow  the  perspective  pro¬ 
jection  to  be  captured  using  linear  equations  iDuda  and  Hart,  19731.  The  pixel 
coordinates  of  an  image  point  [x'  y']  are  determined  by  the  following  relation¬ 
ships:  x'  —  xjz  and  y'  =  yjz.  These  relationships  are  very  similar  to  (3.3)  and 
(3.4).  In  fact,  x,  y  and  2  are  analogous  to  the  camera  centered  coordinates  x^,  y^ 
and  2c- 

The  rdtimate  goal  of  most  camera  calibration  is  to  enable  the  recovery  of 
metric  3D  information,  such  as  the  pose  (position  and  orientation)  of  an  object, 
from  its  two  dimensional  image.  Clearly,  to  recover  the  pose  of  an  object  it 
is  necessary  to  separate  the  intrinsic  and  extrinsic  parameters.  Separating 
the  parameters  is  difficult  iGanapathy,  1984].  The  problem  is  nonlinear  and 
several  of  the  parameters  are  closely  coupled.  In  the  presence  of  noise  a  single 
solution  to  the  camera  calibration  problem  does  not  exist,  rather  there  exists 
a  set  of  solutions.  These  solutions  can  differ  significantly  and  yet  give  rise 
to  nearly  identical  images.  For  example,  in  the  presence  of  noise  significant 
trade  offs  can  be  made  between  and  /.  This  can  result  in  a  solution  which 
looks  good  from  one  view  point  but  where  neither  the  intrinsic  nor  extrinsic 
parameters  are  correct.  The  fact  that  the  optimal  solution  for  one  view  point 
may  not  be  the  globally  optimal  solution  is  at  the  heart  of  what  makes  camera 
calibration  hard. 


4. 1.  A  PERSPECTIVE  TRANSFORMATION  IS  ENOUGH 
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Camera  calibration  is  often  performed  as  a  preliminary  step  in  many  appli¬ 
cations.  A  set  of  camera  calibration  parameters  is  recovered  and  the  intrinsic 
parameters  are  used  for  future  images.  This  assumes  that  a  globally  optimal 
set  of  intrinsic  parameters  has  been  recovered  and  that  the  parameters  are 
fixed.  By  using  multiple  images  or  multiple  calibration  planes  the  solution  can 
be  improved  in  a  global  sense.  However,  in  the  presence  of  noise  small  errors 
can  lead  to  large  errors  for  view  points  significantly  different  than  those  used 
during  calibration.  Further,  the  intrinsic  camera  calibration  parameters  are 
not  fixed.  They  change  with  the  focus  and  aperture  settings.  For  example,  the 
principle  point  can  shift  by  8  pixels  or  more  with  adjustments  to  focus  [Willson 
and  Shafer,  19931.  The  effective  focal  length  /  also  varies  with  focus  and  aper¬ 
ture  settings.  Zoom  lenses  take  this  variability  to  an  extreme,  enabling  large 
changes  to  /.  Lens  distortion  also  varies  with  changes  to  focus  and  aperture 
[Brown,  19651. 

In  enhanced  reality  visualization  we  are  interested  in  the  total  transforma¬ 
tion  from  model  to  image  coordinates.  We  do  not  need  to  separate  intrinsic  and 
extrinsic  parameters  to  generate  an  enhanced  reality  image.  All  of  the  param¬ 
eters  comprising  all  of  the  intrinsic  and  extrinsic  calibration  parameters  can  be 
composed  into  a  single  3x4  matrix.  This  insight  is  not  new,  but  how  we  apply 
it  is.  The  following  matrix  equation  is  equivalent  to  (4.1)  and  the  combination 
of  (3.3),  (3.4)  and  (3.5). 

I^MV  (4.2) 

where: 

V  =  XC 

riisxf  +  ri3So  ri2Sy/  +  T\zyo 
r2iSxf  +  r23®0  r22Sy/  +  r23yo 
T'ZlSxf  +  r33a;o  Tz2Syf  -|- 
^x^xf  ^y^yf  ^zVO 

For  our  purposes  finding  values  for  the  elements  of  P  is  sufficient. 

A  more  general  internal  calibration  matrix  can  be  defined  as  follows: 

Cll  Ci2  Ci3 

C  =  C21  C22  C23  (4.4) 

.  C31  C32  C33 

This  new  definition  of  C  has  9  degrees  of  freedom.  These  degrees  of  freedom 
correspond  to  xq,  yo,  Sy,  f,  6^,  and  9z.  Only  5  of  these  9  degrees 

of  freedom  are  vmambiguous.  s^,  Sy  and  /  actually  constitute  2  degrees  of 
freedom.  This  is  equivalent  to  saying  that  C  is  only  defined  up  to  a  scale  factor. 
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^x,  Sy  and  6z  are  redundant  degrees  of  rotational  freedom  which  are  captured 
in  X.  Therefore  the  internal  calibration  matrix  used  in  (4.1)  needs  only  slight 
modification:  the  addition  of  a  skew  parameter  6^.  The  skew  parameter  is 
added  in  the  1®*  colmnn,  2“'^  row  of  C.  The  value  of  the  skew  parameter  is 
equal  to  tan  6-^  or  change  in  the  x  coordinate  based  on  the  y  coordinate.  With 
the  addition  of  this  parameter  to  C  the  perspective  transformation  {XC)  matrix 
becomes; 

’’ll-Sx/  +  ri2  tan  +  tizXq 
j)  _  7‘21<Sx/  +  7’22tan0xy  +  T*23®O 

T’ai-Sx/  +  7’32  tan  ^xy  +  t^zxq 
^x^xf  "h  tan  Oxy  T  tzXQ 

V  can  exactly  model  non-orthogonal  frame  buffer  axes  (^xy  /  0)  and  per¬ 
spective  projection  with  the  image  plane  not  perpendicular  to  the  optic  axis 
(Ox  and/or  6y  0).  A  linear  approximation  of  radial  distortion  can  also  be 
obtained^.  Notice  that  the  revised  definition  of  C  has  5  degrees  of  freedom  and 
when  combined  with  the  6  degrees  of  freedom  contained  in  X  accounts  for  all 
11  degrees  of  freedom  in  V.  Thus  V  implicitly  models  5  intrinsic  and  6  ex¬ 
trinsic  parameters.  The  modeling  is  implicit  because  the  underl5dng  physical 
parameters  are  never  actually  computed.  By  formulating  the  problem  in  this 
manner,  we  avoid  the  difficulties  associated  with  decomposing  the  intrinsic  and 
extrinsic  camera  parameters.  We  solve  (4.2)  for  each  image  we  obtain.  Thus  we 
do  not  need  to  worry  about  finding  a  globally  optimal  solution,  optimal  for  this 
view  point  is  sufficient.  Further,  changes  to  the  intrinsic  camera  parameters 
are  inherently  captured. 


4.2  Depth  Information  From  a  Single  2D  Image 

Even  with  the  simplifications  made  so  far,  the  problem  of  solving  for  the  per¬ 
spective  transformation  which  maps  model  coordinates  to  image  coordinates  is 
still  nonlinear.  While  it  is  possible  to  solve  for  the  elements  of  T  using  a  mini¬ 
mum  of  6  point  features  and  an  iterative  method,  we  can  do  better.  Expanding 
(4.2)  produces: 

x'  =  xjz 

_  P\\N  -+  f2lY  -f  +  P41 

■pizX  P23F  +  pzzZ  -f-  P43 

discussion  of  this  characteristic  can  be  found  in  Appendix  A. 


ri2Syf  -h  7-132/0  7-13  ■ 

'r22Syf  +  7-237/0  7-23 
7-32-Sy/  +  7-337/0  7-33 
^y^yf  izVo  tz  . 
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y'  =  viz 

_  P12-X'  +P22^  +  P32^  +P42 

PI3X  +  ^>23^^  +  P33-^  +  P43 

If  we  knew  the  value  of  2:  =  p\zX  ■{■P23X +P33-^ +^>43  then  solving  for  the  elements 
of  P  would  reduce  to  3  sets  of  linear  equations.  Using  spatial  features,  rather 
than  point  features,  enables  us  to  measure  a  quantity  that  is  proportional 
to  z  for  each  feature.  We  call  this  quantity  the  local  scale  factor.  The  local 
scale  factor  is  equal  to  the  focal  length  of  the  camera  divided  by  the  depth  of  the 
featiire  (syf/z).  We  use  a  circular  fiducial  as  our  spatial  features  [Landau,  1987, 
Thomas  and  Chan,  1989,  Hussain  and  Kabuka,  1990,  Chaudhuri  and  Samanta, 
1991,  Safaee-Rad  et  al,  1992].  The  exact  nature  of  these  fiducials  will  be 
discussed  in  Chapter  5.  In  essence,  by  using  a  spatial  feature  we  are  able  to 
recover  2|D  information  from  a  single  video  image.  The  idea  of  using  spatial 
features  is  not  new,  however  our  use  of  the  information  provided  by  spatial 
features  is.  We  will  modify  our  matrix  equation  slightly  by  multiplying  I  and  V 
by  Since  /  is  expressed  in  homogeneous  coordinates,  multiplying  I  and/or  V 

by  an  arbitrary  constant  has  no  effect  on  the  solution  (^P  and  P  represent  the 
same  solution).  We  will  refer  to  the  elements  of  as  pij  and  define  T  - 
X'  is  a  matrix  of  image  points  of  the  form  [»*  y*  I/5] .  5  is  the  local  scale  factor  at 
the  image  point.  The  pixel  coordinates  of  an  image  point  [a;'  y']  are  determined 
by  x'  =  sx*  and  y'  =  sy*.  x*,  y*  and  1/s  are  similar  to  the  homogeneous  image 
coordinates  defined  in  (4.1).  The  elements  of  P  can  be  solved  for  using  the 
following  three  sets  of  linear  equations  and  as  few  as  four  spatial  features. 

Xi  =  {pn^i  +  P2iYi  +  PsiZi  +  P4i)  (4.8) 

yt  —  (pi2-^i  +  P22li  +  P32^i  +  P42)  (4.9) 

l/si  =  {pizXi  +  p2zYi  +  pzzZi  +  P4z)  (4.10) 

Where  s*,  y*  and  l/si  are  the  components  of  the  z*  image  point  and  Xi,  Yi  and 

Zi  are  the  components  of  the  model  point. 

4.3  Implementation 

It  should  be  noted  that  by  using  a  spatial  feature  the  problem  of  determining  the 
relationship  between  model  coordinates  and  image  coordinates  becomes  linear 
and  the  solution  can  be  found  using  as  few  as  four  features.  Also,  since  all  of 
the  calibration  parameters  (both  intrinsic  and  extrinsic)  are  bundled  into  P  we 
are  not  required  to  make  any  assmnptions  about  the  stability  of  the  intrinsic 
parameters.  This  is  important  in  d5mamic  environments  because  changes  to  the 
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focus,  aperture  and/or  zoom  will  likely  be  needed  during  the  enhanced  reality 
visualization. 

Given  a  model  and  an  image  containing  at  least  four  fiducials  with  known 
model  coordinates  it  is  a  relatively  straightforward  task  to  solve  for  V  and 
generate  an  enhanced  reality  image.  The  basic  steps  are  as  follows: 

1.  Grab  an  image 

2.  Locate  the  fiducials  in  the  image  and  calculate  the  local  scale  factor 

3.  Establish  correspondence  between  fiducials  in  the  image  and  fiducials  in 
the  model 

4.  Solve  for  V 

5.  Using  V,  project  the  model  into  the  image 

6.  Display  the  enhanced  reality  image 

7.  Repeat 

4.3.1  Image  Acquisition 

720  X  480  pixel  images  are  acquired  using  a  Pulnix  TMC-50  color  CCD  camera 
or  a  Panasonic  WV-CD50  monochrome  CCD  camera  with  either  a  25mm  or 
16mm  lens  or  a  CIDTEK  monochrome  CID  camera  with  a  29mm  lens  and  a 
Sun  VideoPix  frame  grabber.  Both  the  cameras  and  frame  grabber  are  rela¬ 
tively  inexpensive  commodity  items.  VideoPix  is  only  capable  of  grabbing  ~  4 
monochrome  or  ~1  color  frame  per  second.  This  severely  limits  the  update  rate 
of  the  enhanced  reality  visualization.  Furthermore,  VideoPix  only  provides  1~ 
bits  of  luminance  information.  Even  though  pixel  values  range  from  0  to  255 
the  actual  resolution  is  less  than  half  of  this  range.  We  intend  to  upgrade  to 
a  better  camera/frame  grabber  combination  in  the  futiire,  however  the  current 
combination  is  sufficient  to  demonstrate  our  method. 

4.3.2  Fiducial  Location 

The  location  (centroid)  and  local  scale  factor  (semi-major  axis  of  the  fiducial’s 
image  divided  by  the  radius  of  the  fiducial)  are  calculated  using  moments. 
Chapter  5  describes  these  calculations  in  detail. 

4.3.3  Correspondence 

Once  an  initial  correspondence  has  been  established,  it  should  be  possible  to 
maintain  correspondence  by  tracking  the  fiducials.  The  idea  is  that  if  images 
can  be  processed  fast  enough  the  locations  of  the  fiducials  should  not  change 
very  much.  Given  the  correspondence  from  the  last  image,  we  look  for  fiducials 
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in  the  new  image  within  a  small  region  around  each  fiducial’s  last  location.  If 
exactly  one  fiducial  is  found  in  that  region  then  the  correspondence  for  that 
fiducial  is  maintained.  If  at  least  some  minimum  nvunber  of  correspondences 
(>  4)  are  maintained,  then  the  fiducials  have  been  successfully  tracked.  If 
correspondence  is  lost  or  no  previous  correspondence  exists  than  an  initial 
correspondence  must  be  established.  The  initial  correspondence  is  performed 
using  a  modified  version  of  the  alignment  method  [Huttenlocher,  19881.  The 
alignment  method  is  modified  to  use  scale  information  as  well  as  some  orien¬ 
tation  constraints  to  significantly  prime  the  search  space.  The  three  major 
constraints  used  are  listed  below: 

•  Each  fiducial  is  visible  from  only  one  side.  Specifically,  the  dot  product 
of  fiducial’s  normal  and  the  viewing  direction  must  be  negative  or  the 
fiducial  is  definitely  not  visible. 

•  The  local  scale  factor  Si  establishes  the  relative  depth  of  the  fiducials  up 
to  the  accuracy  of  the  measimement. 

•  The  local  scale  factor  Si  is  used  to  effectively  unproject  the  image  point 
[a;^  y'j^.  Recall  that  x\  =  x\si  and  y[  =  y^Si.  If  C  is  close  to  a  diagonal  matrix 
or  if  a  reasonable  guess  exists  for  at  least  some  of  the  intrinsic  camera 
parameters,^  then  x*^  and  yf  can  be  treated  as  the  x  and  y  components 
of  the  camera  centered  coordinates  for  the  point.  Since  scaling  is  not 
allowed  between  world  coordinates  and  camera  centered  coordinates  any 
transformation  between  the  two  must  have  a  scale  factor  close  to  unity. 

The  alignment  method  uses  triples  of  model  and  image  points  to  generate 
possible  transformations  from  model  to  image  coordinates.  There  are  a  total 
of(™j(‘j3!  different  sets  of  triples  where  m  is  the  number  of  model  points 
and  i  is  the  number  of  image  points.  In  general  the  alignment  method  pro¬ 
duces  2  solutions  for  every  set  of  model  and  image  points.  This  is  because  the 
alignment  method  assvunes  orthographic  projection  and  is  therefore  xmable  to 
resolve  reflections  about  the  xy  plane.  For  an  image  and  a  model  both  con¬ 
taining  7  points  the  alignment  method  generates  «  15, 000  possible  solutions. 
Utilizing  the  constraints  listed  above  significantly  reduces  this  nximber.  For  20 
random  views  of  an  object  with  7  fiducials,  the  nvunber  of  possible  solutions  was 
reduced  from  15,000  to  an  average  of  100.  For  one  of  the  views  the  constraints 
reduced  the  number  of  possible  solutions  to  3.  Some  of  these  views  are  shown 
in  Chapter  7.  The  constraints  are  applied  using  only  the  set  of  three  image  and 
model  points  and  before  the  remaining  model  points  are  transformed  or  global 
constraints  are  checked.  This  reduces  the  computational  cost  of  establishing 
correspondence  for  an  object  with  7  fiducials  by  over  2  orders  of  magnitude. 

2  0xp6ri6iiCG  only  xq  snd.  yo  nood.  to  bo  ©stiiiiHtod  3Jid  it  is  sufficiont  to  us©  th©  g©oiii©tric 

c©nt©r  of  th©  image  plan©  as  a  fixed  estimate  of  their  values. 
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Pseudo  code  for  establishing  correspondence  follows: 

1.  If  correspondences  >  minimum  =>  done. 

2.  If  0  <  correspondences  <  minimum  ^  establish  the  required  additional 
correspondences . 

3.  Otherwise  =>  establish  correspondences. 

(a)  Find  candidate  transformations  from  model  to  image  points. 

i.  Take  each  possible  triple  of  model  points  and  pair  it  with  each 
permutation  of  three  image  points. 

ii.  For  each  group  of  model  and  image  points  check  that  the  model 
points  are  consistent  and  determine  from  which  side  they  are 
visible.  Pi  and  Ui  are  the  location  and  normal  of  the  model 
point  in  the  triple. 

•  Find  the  normal  to  the  triple 

np  =  (P2  -  Pi)  X  (Ps  -  Pi) 

•  Verify  that  the  model  points  can  be  visible  simultaneously 

SIGN(np  X  ni)  =  SIGN(ri.p  x  712)  =  SIGN(rap  x  773) 

hi.  Transform  (rotate  and  translate)  the  model  points  of  the  triple 
so  that  the  first  point  is  at  the  origin  and  the  points  he  in  the  xy 
plane.  There  are  two  transformations  which  will  accomplish  this, 
choose  the  one  which  will  make  the  model  points  right  side  up  as 
determined  in  Step  3(a)ii.  Call  the  transformed  model  points  M*. 
iv.  Unproject  the  image  points  of  the  triple  and  translate  so  that  the 
first  point  is  at  the  origin.  Call  the  transformed  image  points  I*. 

V.  Calculate  the  transformation(s)  A  which  maps  M*  to  I*  using  the 
alignment  method. 

vi.  Check  that  the  solution  computed  in  Step  3(a)v  is  consistent. 

•  Use  relative  depth  constraints  to  eliminate  one  of  the  two 
solutions. 

•  Verify  that  the  solution  does  not  turn  the  model  points  upside 
down.  Step  3(a)ii  ensured  that  the  model  points  were  right 
side  up.  As  long  as  the  transformation  calculated  in  Step 
3(a)v  does  not  rotate  more  than  90°  about  any  axis  in  the  xy 
plane  the  model  points  will  remain  right  side  up.  z  is  the  unit 
vector  in  the  z  direction. 


SIGN(llzA’ll)  >  0 
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•  Verify  that  the  scale  factor  is  close  to  unity 

vii.  If  the  transformation  computed  in  Step  3(a)v  is  consistent,  com¬ 
pose  the  total  transformation  using  the  transformations  from 
Steps  3(a)iii  and  3(a)v  and  the  translation  from  Step  3(a)iv. 

(b)  For  each  consistent  transformation  determine  correspondences  be¬ 
tween  the  remaining  image  and  model  points  and  calculate  the  cost. 

i.  Transform  the  remaining  model  points  into  camera  centered  co¬ 
ordinates. 

ii.  For  each  transformed  model  point 

A.  Find  the  closest  image  point 

B.  If  the  closest  image  point  is  close  enough,  add  the  correspon¬ 
dence  to  the  match  and  add  the  Euclidean  distance  squared 
to  the  total  cost. 

iii.  Return  a  match  consisting  of  a  list  of  correspondences  and  the 
total  cost. 

(c)  Consolidate  matches  and  find  the  best  one. 

i.  For  each  unique  list  of  correspondences  create  a  consolidated 
match  consisting  of: 

•  The  list  of  correspondences. 

•  The  number  of  matches  containing  the  list  of 
correspondences . 

•  The  mim'rrmm  cost  among  the  matches  containing  the  list  of 
correspondences. 

ii.  Return  the  best  consohdated  match  which  is  the  one  with  the 
largest  number  of  correspondences  or  the  largest  number  of 
matches  or  the  lowest  cost,  in  that  order. 

The  algorithm  used  in  Step  2  to  establish  partial  correspondences  is  very  similar 
to  that  described  in  Step  3.  If  less  then  three  correspondences  exist,  additional 
pairs  of  model  and  image  points  are  combined  with  the  existing  correspondences 
to  form  sets  of  three  model  and  three  image  points.  If  three  or  more  correspon¬ 
dences  exist  then  triples  of  the  established  correspondences  are  used  to  form 
the  sets.  The  rest  of  the  algorithm  is  unchanged.  The  ability  to  perform  partial 
correspondence  greatly  simplifies  the  problem  of  reestablishing  correspondence 
when  most  but  not  all  of  the  fiducials  are  temporarily  occluded.  If  at  least  the 
mim'rrmm  number  of  correspondences  are  maintained  partial  correspondence 
is  not  used  to  find  correspondences  for  fiducials  that  were  occluded  but  are  now 
visible.  This  case  is  handled  nicely  as  part  of  locating  the  fiducials  described  in 
Chapter  5. 
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4.3.4  Solving  for  V 

Equations  (4.8),  (4.9)  and  (4.10)  are  used  to  solve  for  the  elements  of  V.  If 
correspondences  have  been  established  for  more  than  four  fiducials  than  an 
over-determined  system  of  linear  equations  exists.  We  solve  this  problem  in 
a  least-squares  fashion  by  minimizing  the  following  error  terms  using  House¬ 
holder’s  QR  decomposition  [Watkins,  19911. 


Iki|l2 

—  X!  “  iPnXi  -1-  j)2iYi  -t-  TPziZi  -|-  P4i)|^ 

i=i 

(4.11) 

Ikslls 

n 

=  XI  \y*i  ~  +  P22Yi  -F  ‘P22Zi  -f  P42)|^ 

i=l 

(4.12) 

Ikslls 

n  2  2 

=  X - (Pl3^i  +  P2zYi  +  +  P43) 

i=l 

(4.13) 

Householder’s  method  exhibits  good  stability  and  is  relatively  efficient.  Com¬ 
puting  the  QR  decomposition  requires  approximately  nm^  -  m^/3  flops  and 
using  the  decomposition  to  solve  for  the  unknowns  requires  approximately 
2nm  —  m^/2  flops  where  n  is  the  number  of  unknowns  and  m  is  the  number  of 
equations.  Splitting  the  problem  into  three  sets  of  equations  has  a  significant 
computational  advantage.  Equations  (4.8),  (4.9)  and  (4.10)  can  be  rewritten  in 
matrix  form  as  follows: 

=  MPi  (4.14) 

=  MV2  (4.15) 

S  =  MVz  (4.16) 

X*,  Y*  and  S  are  column  vectors  whose  components  are  x^,  yf  and  1  /si  respec¬ 

tively.  M  is  a  m  X  4  matrix  whose  z*'*"  row  is  the  z^*"  model  point  [Xi  Yi  Zi  1].  Vi  is 
the  z*^  column  of  V.  Matrix  M  is  decomposed  into  the  matrices  Q  and  R.  Since 
M.  is  common  to  all  three  equations  we  need  only  perform  the  decomposition 
once.  We  can  simply  reuse  the  decomposition  for  the  remaining  equations.  In 
essence,  you  pay  for  solving  (4.14)  and  you  get  the  solutions  to  (4.15)  and  (4.16) 
for  very  little.  Formulating  the  problem  as  three  sets  of  linear  equations  each 
with  4  unknowns  reduces  the  complexity  of  computing  the  decomposition  by  an 
order  of  magnitude  and  solving  for  the  xmknowns  by  half  an  order  of  magni¬ 
tude  compared  to  solving  a  single  set  of  linear  equations  with  12  unknowns.  In 
fact  the  QR  decomposition  need  not  be  recomputed  until  the  correspondences 
change,  further  increasing  the  computational  savings. 

4.3.5  Creating  the  Enhanced  Reality  Image 

Creating  the  enhanced  reality  image  consists  of  two  steps:  making  a  virtual 
image  (rendering  the  model)  and  combining  the  virtual  image  and  the  raw 
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image.  V  is  used  to  project  the  model  into  an  initially  empty  virtual  image. 
Currently,  models  consist  of  either  a  collection  of  points  or  a  collection  of  line 
segments.  Points  are  projected  by  multiplying  by  V  and  then  rounding  off  to  the 
nearest  pixel.  Line  segments  are  handled  by  using  V  to  project  the  endpoints 
into  the  virtual  image  and  then  drawing  a  line  between  them.  No  anti-aliasing 
or  z-buffering  is  performed.  As  a  result,  some  edges  are  jagged  and  some 
points  which  perhaps  should  not  be  visible  are.  Once  the  virtual  image  has 
been  generated  it  must  be  combined  with  the  raw  image  to  form  the  enhanced 
reality  image.  Pixels  in  the  virtual  image  have  precedence  over  pixels  in  the 
raw  image.  If  a  pixel  in  the  virtual  image  is  nonzero  (zero  being  no  information) 
then  its  value  is  placed  in  the  corresponding  location  in  the  enhanced  reality 
image.  If  a  pixel  in  the  virtual  image  is  zero  then  the  corresponding  pixel  in 
the  raw  image  is  used.  Both  the  rendering  and  combination  routines  are  a 
bit  simplistic,  but  are  sufficient  to  demonstrate  oiu*  method.  Rendering  and 
combination  are  important  problems,  however  they  are  not  the  focus  of  this 
work. 


4.3.6  Displaying  the  Enhanced  Reality  Image 

The  enhanced  reality  image  is  displayed  as  a  single  image  on  a  high  resolution 
CRT  using  the  X  window  system.  The  size  of  the  image  to  be  displayed  and 
whether  it  is  to  be  displayed  on  the  local  machine  greatly  affects  the  time  it 
takes  to  display  an  image.  In  the  current  implementation,  about  20%  of  the 
computation  time  is  spent  simply  putting  a  half  sized  enhanced  reality  image 
on  the  screen.  Creating  a  stereo  display  or  using  a  video  see-through  HMD  are 
straightforward  extensions  of  our  method. 

4.3.7  Discussion 

The  current  system  is  implemented  in  Lucid  Common  Lisp  and  runs  on  a 
SparcStation  2.  Currently,  the  two  most  limiting  components  are  the  frame 
grabber  and  the  rendering/display  system.  Depending  on  the  complexity  of  the 
model,  the  renderer  may  require  a  minute  or  more  to  perform  the  rendering. 
Using  a  simple  model,  frame  rates  of  ~2hz  can  be  achieved.  Table  4.1  shows  a 
break  down  of  the  computational  time  required  for  the  major  functions.  Low  end 
SGI  machines  such  as  the  Indy  are  capable  of  grabbing  a  full  size  color  image 
and  displaying  it  on  the  screen  at  >  30hz.  The  SGI  machines  are  also  capable  of 
rendering  a  fairly  complex  model  and  displaying  it  at  >  30hz.  If  these  times  are 
substituted  for  frame  grabbing,  rendering  and  displaying  a  frame  rate  of  ~10hz 
resffits.  The  frame  rate  woidd  be  ~20hz  if  the  time  reqmrements  for  frame 
grabbing,  rendering  and  displaying  could  be  eliminated  entirely.  Little  effort 
has  been  put  into  optimizing  the  current  implementation  for  speed.  Recoding 
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Function 

Time  Required 

Grab  Frame 

0.25s 

50.3% 

Find  Fiducials  and  Calculate 
Centroid  and  Local  Scale  Factor 

0.03s 

6.1% 

Check  Correspondences 

0.007s 

1.4% 

Calculate  Perspective 
Transformation 

0.01s 

2.0% 

Render  Model  and  Create 
Enhanced  Reality  Image 

0.1s 

20.1% 

Display  Image 

0.1s 

20.1% 

Table  4.1:  Time  required  for  major  frmctions  running  on  a  SparcStation  2  in 
Common  Lisp  using  a  simple  model  and  displaying  a  half  sized  enhanced  reality 
image. 


some  portions  and  using  a  C-30  digital  signal  processing  board  to  grab  and 
process  images  should  produce  significant  improvements  in  speed.  We  believe 
frame  rates  of  >30hz  should  be  achievable  with  this  modest  hardware. 

Assmning  that  we  are  given  a  model  to  be  used  in  creating  the  enhanced 
reality  image  is  not  unreasonable.  In  fact,  it  is  a  fundamental  assmnption  of 
enhanced  reality  visualization.  The  basic  idea  is  to  add  information  to  an  image. 
This  information  in  most  cases  is  not  visible  from  the  current  view  point  and 
must  come  from  some  source  other  than  the  raw  image  (typically  the  model). 
This  is  not  to  say  that  constructing  a  model  is  easy.  Model  building  is  simply 
not  the  focus  of  this  work.  Assuming  that  we  know  the  model  coordinates  of 
the  fiducials  in  some  cases  is  unreasonable.  This  amounts  to  assmning  that 
the  fiducials  are  part  of  the  model.  In  Chapter  7  we  present  enhanced  reality 
visualizations  of  a  test  object  and  a  plastic  skull.  The  model  for  the  test  object 
is  a  CAD-like  model  and  the  fiducials  are  part  of  it.  This  is  not  the  case  for  the 
skull.  Here,  the  model  is  a  CT  scan  of  the  skull.  The  fiducials  are  not  present 
in  the  CT  data  and  are  not  part  of  the  model.  In  this  case,  we  need  a  method 
of  determining  the  model  coordinates  of  the  fiducials.  Chapter  6  presents  the 
details  of  determining  the  model  coordinates  for  the  fiducials  used  on  the  skull. 


Chapter  5 


Feature  Detection  and 
Localization 


In  the  last  chapter  we  described  the  theory  behind  our  method.  The  success 
of  any  method  for  enhanced  reality  visualization  is  inseparably  tied  to  the 
accuracy  of  the  data  used  to  determine  to  transformation  which  maps  model 
coordinates  to  image  coordinates.  In  this  chapter  we  will  discuss  the  practical 
details  of  finding  fiducials  and  the  accuracy  with  which  their  position  and  local 
scale  factor  can  be  determined. 

5.1  Details 

The  circular  fiducials  used  in  our  method  are  detected  using  pattern  matching 
and  are  localized  using  moment  calciilations.  An  actual  size  fiducial  is  shown 
in  Figure  5-1.  A  chord  passing  near  the  center  of  the  fiducial  will  exMbit 
transitions  from  light  to  dark,  dark  to  light,  light  to  dark  and  dark  to  light. 
Figure  5-2  shows  a  blow-up  of  an  image  of  a  fiducial  and  the  intensity  profile 
of  a  chord  line.  Constraints  such  as  the  steepness  of  the  transition,  the  length 
of  the  transition  and  the  separation  between  transitions  eliminate  nearly  all 
detections  which  do  not  come  from  actual  fiducials.  By  checking  the  rows  and 
columns  of  an  image  for  collocated  occxurences  of  this  transition  pattern  the 
presence  of  an  fiducial  can  be  further  validated  and  a  rough  position  can  be 
efficiently  foimd.  The  time  required  is  linear  in  the  size  of  the  image.  In 
addition,  a  bounding  box  {x\,  X2,  yi  and  ya)  and  an  upper  and  lower  threshold 
(^upper  and  iiower)  for  each  fiducial  can  be  readily  obtained  from  this  process. 
The  boimding  box,  with  vertices  [xi  yi],  [xi  y2\,  [x2  yi]  and  [x2  1/2],  is  slightly 
larger  than  the  smallest  rectangle  aligned  with  the  axes  which  can  contain  the 
fiducial.  The  upper  and  lower  thresholds  are  used  to  rescale  the  pixel  values. 
These  values  are  needed  because  we  use  grey  scale  moments  to  find  the  centroid 
and  local  scale  factor  of  the  fiducial.  The  local  scale  factor  is  the  semi-major  axis 
of  the  fiducial’s  image  divided  by  the  radius  of  the  fiducial.  The  boimding  box 
and  thresholds  for  one  fiducial  are  completely  independent  from  those  of  the 
other  fiducials.  This  produces  an  extremely  robust  detection  and  localization 
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Figure  5-1:  Actual  size  Figure  5-2:  Enlarged  image  of  a  fiducial  with 
fiducial.  pixel  values  for  a  chord  shown  to  the  side. 


alprithm.  For  example,  large  gradients  in  average  image  intensity,  such  as 
might  be  caused  by  shadows,  have  little  effect  on  detecting  and  localizing  the 
fiducials. 

Fiducials  are  detected  by  looking  for  occurrences  of  intensity  profiles  such 
as  the  one  shown  in  Figure  5-2.  The  location  and  the  maximum  and  miniTrmm 
values  of  these  profiles  are  used  to  determine  a  bounding  box  and  an  upper  and 
lower  threshold  for  the  fiducial.  Once  a  fiducial  has  been  detected  moments 
are  used  to  calciilate  the  centroid  and  local  scale  factor  of  the  fiducial.  Detailed 
pseudo  code  used  to  locate  fiducials  follows: 

1.  Grab  a  fresh  image. 

2.  For  each  fiducial  present  in  the  last  image: 

(a)  Define  a  window  centered  at  the  last  location  with  dimen¬ 

sions  equal  to  2kr.  A:  is  a  window  size  scale  factor  and  r  is  equal  to 
the  fiducial’s  local  scale  factor  Si  in  the  last  image  times  the  fiducial’s 
radius  in  model  coordinates.^ 

(b)  Scan  the  window  for  fiducials. 

i.  For  each  horizontal  and  vertical  scan  line  in  the  region  collect 
possible  fiducial  detects. 

A.  Initialize  the  following  parameters:  h  through  U  (location 
of  transition  1—4),  Zmin  and  /max  (the  minimum  and  maximum 

^Sx/y  is  the  ratio  of  pixel  spacing  in  the  x  and  y  directions  {s^/sy).  This  quantity  is  used  to 
correct  for  the  fact  that  pixels  generally  are  not  square.  The  x  dimension  of  the  window  must 
be  scaled  by  1/sx/y 
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separation  between  transitions),  wi  through  (width  of  tran¬ 

sition  1-4),  tomax  (the  maximum  width  of  a  transition), 
through  34  (slope  of  transition  1-4),  (the  minimmn  in¬ 
tensity  change  to  be  considered  above  noise),  tapper  and  tiower 
(upper  and  lower  thresholds). 

B.  Scan  across  or  down  the  scan  line  until  the  change  in  pixel  in¬ 
tensity  <  -3inin.  Scan  back  several  pixels,  take  the  minimum 
value  and  if  it  is  less  than  tapper  store  it  in  tapper-  Update  l\,  wi 
and  31- 

C.  Continue  scanning  iintil  the  change  in  pixel  intensity  is  no 
longer  <  -3inm-  Scan  ahead  several  pixels,  take  the  maximum 
value  and  if  it  is  less  than  tiower  store  it  in  tiower-  Update  h,  wi 
and  31-  If  wi  >  tomax  go  to  Step  2(b)iA. 

D.  Continue  scanning  until  the  change  in  pixel  intensity  is  > 
3inm.  Scan  back  several  pixels,  take  the  maximum  value  and 
if  it  is  greater  than  tiower  store  it  in  tiower-  Update  h,  W2  and  32- 

E.  Continue  scanning  until  the  change  in  pixel  intensity  is  no 
longer  >  3inm.  Scan  ahead  several  pixels,  take  the  minimmn 
and  if  it  is  less  than  tapper  store  it  in  tapper-  Update  I2,  W2  and 
32-  If  tmin  —  ^2  —  h  ^  ^max  Or  3i  9^  32  gO  tO  Step  2(b)lA. 

E  Repeat  Steps  2(b)iB  and  2(b)iC  except  update  k,  wz  and  33.  If 
103  >  Wmax  or  2(^2  “*  ^l)  9^  (^3  ~  ^2)  Or  33  9^!  52  5i  go  to  Step 
2(b)iA. 

G.  Repeat  Steps  2(b)iD  and  2(b)iE  except  update  U,  and  34.  If 
W4  >  Wmax  or  (Z4  —  I3)  76  2(t2  —  h)  ~  (^3  —  h)  or  34  96  33  ~  32  ~  Si 
go  to  Step  2(b)iA. 

H.  Add  the  detection  (location,  size  and  thresholds)  to  a  list  of 
detections. 

I.  Go  to  Step  2(b)iA 

ii.  Consolidate  detections.  Detections  from  adjacent  scan  lines  are 
combined  if  they  overlap  by  at  least  50%.  Detections  from  or¬ 
thogonal  scan  lines  are  combined  if  the  detections  intersect.  All 
consolidations  retain  the  maximum  bounding  box,  the  minimmn 
Supper  and  the  maximum  ti  ower- 

iii.  Expand  the  bounding  box  as  necessary  to  ensure  the  fiducial  is 
fully  enclosed.  This  is  required  because  if  the  fiducial  is  elliptical 
in  shape  and  is  at  an  angle  to  the  x  and  y  axes,  the  bounding  box 
generated  by  the  detections  may  imder  estimate  the  size  of  the 
fiducial. 

(c)  Calculate  the  0*^,  1®^  and  2“*^  moments  of  inertia  as  well  as  the  Euler 
number  of  the  window  using  grey  scale  values. 
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(d)  If  the  Euler  niunber  is  zero  return  the  the  location  and  local  scale 
factor  [x'  y'  1/s]  for  the  fiducial. 

3.  If  there  was  a  valid  solution  for  the  last  image,  predict  the  location  and 
size  of  each  model  point  for  which  a  correspondence  did  not  exist  using 
this  solution.  Perform  Steps  2a  through  2d. 

4.  If  the  munber  of  correspondences  maintained  in  Step  2  plus  the  nrunber 
established  in  Step  3  is  less  than  the  minimum,  scan  the  entire  image  for 
fiducials  and  establish  correspondences  for  any  new  fiducials  found  using 
the  algorithm  presented  in  Section  4.3.3. 

The  zeroth,  first  and  second  order  moments  of  a  region  bounded  hy  xi,  X2,  yi 
and  y2  can  easily  be  calculated  using  the  following  formulas: 


J/2 


"^0  =  X!  pi^^y) 

x=xi  y=yi 

*2  2/2 

(5.1) 

x=xi  y=yi 

®2  2/2 

(5.2) 

=  Y  Y  pi^^y)y 

X-Xi  y-yi 

®2  2/2 

(5.3) 

=  Y  Y  pi^^  y)  [y^  + 

x=xi  y=yi 

*2  y2 

(5.4) 

=  Y  Y  pi^^  y)  i^y  +  v) 

x=xi  y=yi 

2/2 

(5.5) 

=  Y  Y  Pi^y  y)  +  h) 

(5.6) 

x=xi  y=yi 


X  and  y  are  image  coordinates  and  p{x,  y)  is  a  weight  based  on  the  value  v{x,  y) 
of  the  pixel  at  image  coordinates  x,y.  4,  ixy  and  iy  are  the  moments  of  inertia  for 
an  individual  pixel  about  the  center  of  the  pixel.  The  weights  are  determined 
by  the  following  fimction. 

{0  if  v{x,y^  >  ^upper 

1  ifv(x,y)  <  Slower  (5-'^) 

otherwise 

tupper— flower 

The  Euler  number  of  the  region  can  be  used  to  verify  that  only  a  single  object 
with  a  single  hole  is  present.  By  using  an  upper  and  lower  threshold  we  can 
ensure  that  noisy  pixels  which  are  not  on  the  fiducial  do  not  contribute  to  the 
moment  calculations  and  that  noisy  pixels  which  are  entirely  on  the  fiducial 
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contribute  fully.  Euler  numbers  can  also  be  used  to  verify  that  the  thresholds 
have  been  properly  chosen.  From  (5.1)  through  (5.6)  the  centroid  of  the  region 
and  the  moment  about  the  axis  of  greatest  inertia  can  be  calculated.^ 


X 

y 


m. 


5x/y 


mo 


rriv 


mo 


(5.8) 

(5.9) 


^mx2  -  moy^)  sin^6  +  2  (sj^/ym^y  —  moxy)  cos  6  sin  6  + 

(s^^yiny2  —  mo®^)  cos^6  (5.10) 


1  f  2m 

6  =  -  arctan  - 

2  \  s-^iyTny2 

It  is  well  known  that  the  perspective  projection  of  a  circle  is  an  ellipse.  The 
semi-major  axis  of  the  ellipse  can  be  calculated  using  the  following  equation:^ 


_ j 

^x^/^x/y/ 


a  = 


4/n 


mo  (l-f-rf/r2) 


(5.11) 


For  now  we  will  assume  that  orthographic  projection  is  a  reasonable  model  for 
the  area  immediately  surrounding  a  fiducial,  see  Figiire  5-3.  Later  we  will 
consider  the  error  introduced  by  this  assumption,  Figure  5-4.  This  error  is 
sometimes  referred  to  as  perspective  distortion.  Given  this  assumption,  the 
centroid  of  the  circle  Cc  projects  onto  the  centroid  of  the  ellipse  Ce  and  the 
diameter  d!  projects  onto  the  major  axis  of  of  the  ellipse,  a'.  The  diameter  d'  is 
parallel  to  the  image  plane  so  it  is  not  foreshortened.  The  following  equations 
relate  the  fiducial  to  its  projection. 


X 

y 


Li 


s  =  — 


a 

J' 


sx*  =  x' 


sy*  =  y’ 


z* 


(5.12) 

(5.13) 

(5.14) 


It  should  be  noted  that  5,  x',  y',  x*  and  y*  are  the  same  parameters  as  in  (4.8) 
through  (4.10).  x',  y'  and  s  can  be  easily  calculated  requiring  time  linear  in  the 
number  fiducials  and  their  size.  At  first,  the  need  to  use  5x/y  in  recovering  s 

^The  quantity  shown  for  I^ax  should  actually  be  multiplied  by  an  additional  factor  of  s^/y. 
We  have  omitted  it  for  simphcity  sake  because  it  cancels  with  the  same  factor  for  mo  in  (5.11). 

^  Our  fiducials  have  holes  in  them  and  the  additional  factor  of  ( H-  rf  /r ^ )  in  the  denominator 
corrects  for  this,  n  is  the  inner  radius  and  t-q  is  the  outer  radius. 
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Figure  5-3:  Orthographic  projection 
of  a  circle. 


Figure  5-4;  Perspective  projection  of 
a  circle. 


would  appear  to  a  significant  limitation.  This  is  not  the  case  because  is  easy 
to  calibrate  and  it  does  not  change  iLenz  and  Tsai,  1988,  Penna,  1991].  is 
a  function  of  the  aspect  ratio  of  the  image  sensor  and  the  ratio  of  camera  and 
frame  grabber  clock  frequencies.  The  physical  properties  of  the  image  sensor 
cannot  change  and  modern  clocks  have  extremely  stable  frequencies.  Therefore 
it  is  very  reasonable  to  calibrate  once  and  then  forget  about  it.  could 
also  be  determined  via  self-calibration  removing  any  burden  to  the  user. 


5.2  Error  Analysis 

There  are  several  soimces  of  error  associated  with  processing  digital  images. 
One  of  the  more  significant  sources  is  quantization  errors  iKamgar-Parsi  and 
Kamgar-Parsi,  19891.  These  errors  are  the  result  of  taking  a  continuous  signal 
and  converting  it  to  digital  values.  First,  we  will  examine  errors  caused  by 
the  fact  that  pixel  values  are  only  available  at  discrete  locations  in  a  lattice. 
Consider  a  row  of  pixels  such  as  those  shown  in  Figime  5-5.  The  grid  represents 
the  pixel  lattice.  Pixels  have  just  two  states,  on  and  off,  with  shaded  squares 
representing  on  pixels.  Using  the  centroid  calculation  described  above  both 
rows  have  the  same  centroid.  The  maximiun  error  in  the  horizontal  position 
of  the  centroid  is  0.5  pixels.  In  the  vertical  direction  the  maximum  is  also  0.5 
pixels  so  the  maximiun  distance  between  the  calculated  centroid  and  the  actual 
centroid  is  1/ \/2 . 

The  maximum  error  can  be  reduced  significantly  by  using  a  circular  shape 
[Bose  and  Amir,  1990,  Efrat  and  Gotsman,  1993].  Figure  5-7  shows  a  digital 
approximation  of  a  circle.  The  improvement  which  results  from  using  a  circular 
shape  is  caused  by  the  fact  that  the  error  for  a  given  row  or  column  is  dependent 
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Figure  5-5:  The  effect  of  quanti-  Figure  5-6:  The  effect  of  quanti¬ 
zation  errors  on  the  centroid  of  a  zation  errors  on  the  length  of  a 
row  of  pixels.  row  of  pixels. 


Figure  5-7:  A  digital  approxima-  Figure  5-8:  Model  for  grey  scale 
tion  of  a  circle.  pixel  values. 
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Figure  5-9:  Error  in  the  centroid  of  a  circular  fiducial.  Error  is  the  distance 
in  pixels  between  the  actual  centroid  and  that  of  a  digital  approximation.  The 
radius  is  also  expressed  in  pixels. 

upon  the  errors  of  the  other  rows  or  columns.  In  short,  the  errors  tend  to  cancel 
out.  For  a  circle  of  radius  r  and  centroid  [ajo  2/o]  the  maximum  error  in  the 
calculated  centroid  fi{r)  is  given  by  the  following  expression. 

fi{r)  =  max  (II  [xo  yo]  -  CENTROID  (a:o,t/o,r,dr)||)  (5.15) 

CENTROID()  is  a  function  which  calculates  the  centroid  of  a  digital  approxi¬ 
mation  of  a  circle  using  (5.1)  through  (5.3),  (5.8)  and  (5.9).  p{x,y)  is  replaced 
with  INSIDE?()  which  returns  a  1  if  (a;  —  xq)'^  +  (z/  —  yo)^  ^  (r  -1-  dr)^  otherwise 
it  retxuns  0.  Figure  5-9  shows  the  maximum  error  in  the  centroid  calculation 
/i(r).  The  circles  are  the  result  of  evaluating  (5.15)  for  0.0  <  xq,  yo,  dr  <  1.0  with 
0.01  increments.  The  crosses  are  the  result  of  a  stochastic  sampling  method  to 
find  the  maximxun  over  the  same  region.  1/v^  is  also  plotted  on  the  axes.  The 
curve  fits  the  data  well  and  is  a  good  estimate  of 

The  radius  calculations  described  above  are  subject  to  errors  similar  to 
those  seen  for  the  centroid.  The  maximiun  error  in  the  length  is  1  pixel,  see 
Figure  5-6.  Using  a  circular  shape  will  also  reduce  the  error  in  the  calculated 
radius  for  the  same  reasons  as  above.  For  a  circle  of  radius  r  and  centroid 
[xq  yo]  the  maximum  error  in  the  calculated  radius  v{t)  is  given  by  the  following 
expression. 

v{t)=  max  (||r- RADIUS  (so,  2/0,  r,dr)||)  (5.16) 

FADIUS()  is  a  function  which  calculates  the  radius  of  a  digital  approximation 
of  a  circle  using  (5.1)  through  (5.6),  (5.8)  through  (5.10)  and  (5.11).  Again, 
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Figure  5-10:  Error  in  the  radius  of  a  circular  fiducial.  Error  is  the  difference 
in  pixels  between  the  actual  radius  and  that  of  a  digital  approximation.  The 
radius  is  also  expressed  in  pixels. 

p{x,y)  is  replaced  with  INSIDE?()  which  returns  a  1  if  (a;  -  xof  +  {y  -  yof  < 
(r  +  drf  otherwise  it  returns  0.  Figure  5-10  shows  the  maximum  error  in 
the  radius  calculation  i^(r).  The  circles  are  the  result  of  evaluating  (5.16)  for 
0.0  <  xo,yo,dr  <  1.0  with  0.01  increments.  The  crosses  are  the  result  of  a 
stochastic  sampling  method  to  find  the  maximum  over  the  same  region.  ^J2/3T 
is  also  plotted  on  the  axes."^  The  ciirve  fits  the  data  well  and  is  a  good  estimate 
of  ^'(r). 

The  maximum  error  for  both  the  centroid  and  radius  can  be  further  reduced 
by  using  grey  scale  values  rather  than  binary  values  [Chiorboli  and  Vecchi, 
1993].  Grey  scale  values  can  be  modeled  as  the  sum  of  a  number  of  binary 
sub-pixels.  Figure  5-8  shows  a  pixel  with  a  dynamic  range  of  122  and  a  value 
of  25.  This  effectively  increases  the  pixel  resolution  by  the  square  root  of  the 
dynamic  range,  y^iupper -Slower  +  1-  This  increase  in  the  resolution  increases  the 
effective  radius  of  the  circle. 

Another  class  of  quantization  error  is  caused  by  the  fact  that  pixel  values  are 
the  result  of  a  spatial  process.  The  value  of  a  particular  pixel  is  not  the  intensity 
at  some  infinitesimal  point,  rather  it  is  the  average  intensity  within  the  area 
of  the  pixel.  Figure  5-8  shows  a  model  of  a  grey  scale  pixel.  If  the  shaded  and 
mishaded  portions  of  the  pixel  represents  maximum  and  minimum  intensity 

^The  factor  of  results  because  (5.11)  assumes  a  elliptical  shape  and  our  digital  ap¬ 

proximations  are  not  truly  ellipses.  This  error  is  most  pronounced  at  small  radii  however  some 
error  will  always  be  present. 
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Figure  5-11:  A  pixel  partially  Figure  5-12;  Effect  on  a  circular 
covered  by  a  larger  figure.  figure. 


respectively,  then  the  pixel  has  a  value  of  0.2  or  25  on  a  scale  from  0  to  121.  If 
the  shaded  portion  is  produced  by  a  larger  rectangular  figure  for  which  we  are 
calculating  moments,  the  correct  values  for  x  and  y  to  use  in  (5.2)  through  (5.6) 
are  the  x  and  y  components  of  the  centroid  of  the  shaded  region.  The  centroid 
of  the  shaded  region  is  not  the  center  of  the  pixel,  however  (5.2)  through  (5.6) 
assume  that  it  is,  in  essence  treating  the  pixel  as  if  it  were  homogeneous.  The 
fact  that  the  centroid  of  the  region  which  produces  a  pixel’s  value  may  not  be 
the  center  of  the  pixel  introduces  error  into  the  moment  calculations.  Figure 
5-11  shows  a  pixel  only  partially  covered  by  a  larger  figure,  p  is  fraction  of  the 
pixel  covered  by  the  larger  figure.  The  calculated  and  actual  contribution  to  the 

first  moment,  ”^iactuai’  well  as  their  difference.  Ami,  are  given  by 

the  following  equations: 


^Icalculated  ~  W 

(5.17) 

’^lactual  =  2  j 

(5.18) 

Ami  =  '^IcalcuJated  -  '^lactuaJ 

=  p(l-i?)/2 

(5.19) 

A  maximum  Ami  of  1/8  occurs  when  p  =  1/2.  Ami  cannot  be  negative  therefore 
the  maximum  error  results  when  Ami  =  1/8  along  one  side  of  the  figure  and 
Ami  =  0  along  the  other.  We  will  assume  that  Ami  =  1/8  along  the  half  circle 
shown  in  Figure  5-12.®  The  circle  has  a  radius  of  r  and  Ami  is  towards  the 

This  assumption  overestimates  the  error  on  two  counts.  First  Ami  cannot  equal  1/8 
everywhere  along  the  hemi-circle.  Second  Ami  assumes  that  the  partial  figure  is  aligned  with 
the  pixel  grid.  If  it  is  not,  the  maximum  error  is  reduced. 
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Figure  5-13:  Error  in  the  centroid 
resulting  from  the  homogeneous  as- 
sximption.  Both  the  error  and  radius 
are  expressed  in  pixels 


Figure  5-14:  Error  in  the  radius  re¬ 
sulting  from  the  homogeneous  as¬ 
sumption.  Both  the  error  and  radius 
are  expressed  in  pixels. 


center  of  the  circle  producing  the  following  expression  for  the  contribution  to 
the  y  component  of  the  centroid: 


Amiy(a;) 


(5.20) 


Integrating  this  expression  from  — r  to  r  and  dividing  by  the  area  results  in  a 
maximtim  error  in  the  centroid  of  Aymax  =  l/16r  as  shown  in  Figure  5-13. 

An  analysis  of  the  error  in  the  second  moment  is  similar.  The  calculated 
and  actual  contribution  to  the  second  moment,  m2  calculated  and  as  well  as 

their  difference,  Am2,  are  given  by  the  following  equations: 


=  p(/  +  l/12) 

(5.21) 

’nZactual 

-  p(y-  2  ) 

(5.22) 

Am2 

^^2actual 

=  py  (1  -  p)  -  (1  -  3p  +  2p2) 


(5.23) 
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Figure  5-15:  The  ideal  intensity  pro-  Figure  5-16;  The  effect  of  bleeding  on 
file  for  a  cross  section  of  a  circular  disk,  the  disk  in  the  figure  to  the  left. 


The  maximum  Am2  and  the  value  of  p  at  which  it  occurs  are  functions  of  y.  We 
will  assume  that  =  1/2  and  Am2^  =  2//4.®  The  maximum  error  results 
when  Am2  =  2//4  around  the  entire  circumference  of  the  circle.  Doubling  Ato2 
and  correcting  for  the  orientation  produces 

Am2(a:)  =  — — .  (5.24) 

Integrating  this  expression  from  -r  to  r  and  substituting  into  (5.11)  results  in 

a  maximum  error  in  the  radius  of  Ar^ax  =  r  —  +  S/Stt  as  shown  in  Figure 

5-14. 

Next  we  will  consider  image  formation  errors.  These  errors  include  noise 
and  nonlinearities  in  the  image  sensor  [Dinstein  et  al,  1984,  Healey  and  Kon- 
depudy,  19941.  For  our  piirposes  the  most  significant  phenomenon  is  the 
smoothing  of  high  contrast  edges.  We  will  refer  to  this  as  bleeding.  Figure 
5-15  shows  the  ideal  intensity  profile  for  a  cross  section  of  a  circular  disk  and 
Figure  5-16  shows  the  effect  of  bleeding  on  the  same  disk.  We  have  shown  the 
transition  from  maximxim  intensity  to  minimiun  intensity  as  linear.  This  is 
almost  certainly  not  the  case,  however  it  makes  little  difference  for  om*  analy¬ 
sis.  As  long  as  the  transition  has  the  same  shape  all  along  the  circumference 
of  the  circle,  bleeding  has  no  effect  on  the  centroid.  The  inertia  of  the  two 
disks  shown  in  Figures  5-15  and  5-16,  however  are  not  the  same,  therefore  the 
radius  calculation  is  effected  by  bleeding.  In  order  to  explore  this  effect  we  will 
consider  the  ellipse  produced  by  the  following  function 

f[x,y)  =  min(max(t;,0),  1)  (5.25) 


1  —  !  a?  — 

1  —  (a  —  w)^  /a^ 


(5.26) 


a  and  b  are  the  semi-major  and  semi-minor  axis  of  the  ellipse  and  w  is  the  length 
of  the  transition,  v  was  chosen  because  it  is  a  good  approximation  to  a  linear 
transition  and  is  easily  integrable.  The  ellipse  is  shown  in  Figure  5-17.  The 

® Actually  p^ax  =  1/2  -  j/  ±  \/y^  +  1/12.  This  function  rapidly  approaches  an  asymptotic 
value  of  1/2.  For  y  =  4.2  the  actual  Am2„„  is  within  2%  of  1/2. 
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Figure  5-17;  Intensity  profile  for  an  ellipse. 


zeroth  and  second  moments  (about  the  axis  of  greatest  inertia)  for  f{x,  y)  are 
as  follows. 


=  ^(y,^-2a-w  +  2a?)  (5.27) 

+  +  (5.28) 

12a  ’ 

We  will  assmne  that  the  actual  edge  occurs  at  a  -  'a;/2  along  the  major  axis. 
The  calculated  semi-major  axis  can  be  found  by  substituting  mo  and  ra%^  into 
(5.11).  The  ratio  of  the  actual  edge  location  to  the  calcxilated  semi-major  axis 
r](a,b,w)  is  a  measure  of  the  error  introduced  by  bleeding  and  is  shown  below. 


3  -  2aw  +  2a^) _ 

7)(a,  b,  w)  {a  w/2)  ^  ^  —  Qa^w  +  3a‘^) 


(5.29) 


Figure  5-22  shows  a  plot  of7]{a,b,w).  The  transition  lengths  we  have  encoim- 
tered  are  typically  less  than  one  pixel. 

So  far  in  our  discussion  of  errors  (with  the  exception  of  bleeding)  we  have 
considered  circles  not  ellipses.  The  analysis  extends  easily  to  cover  ellipses. 
Two  effects  are  seen  as  the  figure  becomes  an  ellipse.  First  the  effective  radius 
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Figure  5-18:  The  effect  of  quanti¬ 
zation  errors  on  the  centroid  for 
a  rectangular  figiue  at  an  angle 
to  the  pixel  lattice. 


Figure  5-19:  The  effect  of  quan¬ 
tization  errors  on  the  radius  for 
a  rectangular  figure  at  an  angle 
to  the  pixel  lattice. 


is  the  semi-minor  axis  h.  Second,  additional  error  is  introduced  when  the  major 
axis  is  not  aligned  with  the  pixel  lattice.  Figures  5-18  and  5-19  show  two 
examples  of  the  latter  effect.  By  modifying  (5.15)  slightly  we  obtain: 

/i(r,  6/a)=  max  (II  [xo  2/o]  -  CENTROID  (®o,  2/0,  dr,  6/a,  0)11)  (5.30) 

6/ a  is  the  ratio  of  the  minor  axis  to  the  major  axis.  CENTRO  ID()  and  INSIDE?() 
are  modified  appropriately  to  handle  ellipses  at  any  angle  9  relative  to  the  x 
axis.  Figure  5-20  shows  the  maximvun  error  in  the  centroid  calculation  /i(r,  6/a). 
A  stochastic  sampling  method  was  used  to  find  the  maximiun  of  (5.30)  over  the 

region  0.0  <  xo,2/o,  dr  <  1.0,  0  <  0  <  tt  and  r  =  10.  is  also  plotted 

on  the  axes.  The  curve  fits  the  data  well  and  is  a  good  estimate  of  /i(r,  6/a).  By 
modifying  (5.16)  slightly  we  obtain: 


i/(r,6/a)  =  max  (l|r  -  RADIUS  (xq,  yo,  dr,  6/a,  0)  H) 

XQ  3/0  dr  ^ 


(5.31) 


RADIUS()  is  modified  appropriately  to  handle  ellipses  at  any  angle  0  relative 
to  the  X  axis.  Figure  5-21  shows  the  maximiun  error  in  the  radius  calculation 
i/(r,  6/a).  A  stochastic  sampling  method  was  used  to  find  the  maximum  of  (5.31) 

over  the  region  0.0  <  xo,yo,dr  <  1.0,  0  <  0  <  tt  and  r  =  10.  is 

Y  3£>/2 

also  plotted  on  the  axes.  The  curve  fits  the  data  well  and  is  a  good  estimate  of 
z/(r,  6/a). 
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Figure  5-20:  Centroid  error  for  an  el-  Figure  5-21:  Radius  error  for  an  ellip- 
liptical  fiducial  with  a  radius  of  10.  tical  fiducial  with  a  radius  of  10.  The 
Tlie  error  is  expressed  in  pixels  and  error  is  expressed  in  pixels  and  b/a  is 
bfa  is  the  ratio  of  minor  and  major  the  ratio  of  minor  and  major  axis, 
axis. 

Finally  we  will  consider  the  errors  introduced  by  our  assiunption  of  ortho¬ 
graphic  projection  for  a  fiducial.  As  shown  in  Figure  5-4,  some  error  is  intro¬ 
duced  in  the  centroid  as  well  as  the  semi-major  axis.  Consider  a  plane  rotated 
by  an  angle  of  (f)  about  an  axis  passing  through  [Xq  Fb  ^ol  in  camera  centered 
coordinates  which  is  parallel  to  the  x  axis.  Let  [X  Y]  represent  a  point  on  the 
plane  and  let  the  origin  of  the  plane  be  [Xo  Yq  ^ol-  Points  on  the  plane  project 
on  to  the  image  plane  by  the  following  relationships 


/(X  +  Xq) 
Y  sin  (j)Y  Zo 


(5.32) 


f(Y  cos  ^ -Ho) 
Y  sin  0  -1-  Zo 


(5.33) 


Next,  consider  a  circle  in  the  plane  and  centered  at  the  origin  with  a  radius  of 
T.  We  can  determine  the  effects  of  perspective  distortion  on  the  centroid  and 
radius  calculation  by  evaluating  following  continuous  versions  of  (5.1)  through 
(5.6). 
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r 

mp=  _ Sx'Sy' 

J-r 

(5.34) 

TTZx  =  /  /  _ x' Sx'Sy' 

J-r  7_yr2-y2 

(5.35) 

rr  /••\/r2-y2 

my  =  J  ^^y' Sx'Sy' 

(5.36) 

fT 

m^2=  /  - y'^  Sx'Sy' 

J-T  J-yr^-y^ 

(5.37) 

J —x  —yj 

(5.38) 

rr  /-v/r2-y2 

~  j - x'^Sx'Sy 

J-T  ■.'--y/r2-y2 

(5.39) 

The  error  in  the  x  component  of  the  centroid  a  is  the  difference  between  the 
projection  of  Xq  and  The  error  in  the  y  component  /3  is  defined  similarly 


fZpXo  fXp 
-  r^sin^^  Zq 


(5.40) 


^  /  [Zlr‘^siv?(j)  +  Ypr"^  cos  (^sin^,^  -  Zpr^  sin  (f)  -  Y^Zp  sin  cj)  +  Z^Yp  cos 

[Zl  -  T-^sin^^j  {Zp  cos  s4>  -  Fo  sin  (^)  Zp  ' 

The  equation  quantifying  the  effect  of  our  orthographic  projection  assumption 
on  the  semi-major  axis  is  as  follows. 


aZp 


(5.41) 


The  full  version  is  much  too  messy  to  include  here,  a  is  the  semi-major  axis 
of  the  projection  and  can  be  solved  for  using  (5.34)  through  (5.39),  (5.10)  and 
(5.11).  Figures  5-23  through  5-27  show  plots  of  a.,  f3  and  7  for  typical  values: 
Rp  -  10cm,  Zp  =  100cm,  r  =  0.5cm  and  /  =  2000  pixels.  Xp  and  Yp  are 
converted  to  polar  coordinates  such  that  Rp  =  y/xf^.  Rp  is  fixed  and  6  is 
one  of  the  axes  plotted.  Although  we  have  not  found  it  necessary,  estimates  of 
the  transition  length  and  the  minor  axis  length  can  be  easily  obtained  from  the 
image  and  used  to  improve  the  calculated  centroid  and  semi-major  axis. 
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Figure  5-22:  The  effect  of  bleeding 
on  the  radius  calculation.  Error  is 
expressed  as  the  ratio  of  the  actual 
radius  and  the  calculated  semi-major 
axis,  d'  ja'.  The  transition  length  and 
radius  are  expressed  in  pixels. 


Figure  5-23:  Perspective  error  in  the 
centroid  parallel  to  the  major  axis. 
The  error  is  expressed  in  pixels. 


Figure  5-24:  Perspective  error  in  the  Figure  5-25:  Total  perspective  error  in 
centroid  perpendicular  to  the  major  the  centroid.  The  error  is  expressed  in 
axis.  The  error  is  expressed  in  pixels,  pixels. 
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Figure  5-26:  Perspective  error  in  the  Figure  5-27:  Perspective  error  in  the 
semi-major  axis  for  Rq  =  10cm.  Error  semi-major  axis  for  Ro  =  10cm.  Error 
is  expressed  as  the  ratio  of  the  calcu-  is  expressed  as  the  ratio  of  the  calcu¬ 
lated  value  and  the  actual  radius.  lated  value  and  the  actual  radius. 

5.3  Experiments 

In  the  absence  of  noise,  two  images  of  a  fiducial  taken  from  the  same  position 
should  produce  the  same  centroid  and  semi-major  axis.  If  our  calculations 
were  exact,  a  series  of  images  taken  from  positions  displaced  only  in  a  direction 
perpendicular  to  the  optic  axis  should  produce  centroids  that  vary  linearly  with 
position.  Similarly,  a  series  of  images  taken  from  positions  displaced  only  in 
a  direction  parallel  to  the  optic  axis  should  produce  semi-major  axes  that  vary 
linearly  with  the  inverse  of  position.  The  degree  to  which  real  data  deviate 
from  these  ideals  is  an  empirical  measure  of  the  accuracy  of  our  calculations. 

Two  sets  of  experiments  were  conducted  using  an  optical  bench.  The  first 
set  of  experiments  examined  the  centroid  calculations,  the  second  set  the  semi- 
major  axis  calculations.  A  rail  with  a  precision  positioner  rims  along  one  side 
of  the  optical  bench.  For  the  first  set  of  experiments,  a  camera  was  mounted 
on  the  positioner  with  its  optic  axis  perpendicular  to  the  rail.  This  setup 
allows  camera  motion  only  in  the  x  direction.  Motion  in  the  y  direction  is 
achieved  by  rotating  the  camera  90°  in  its  mounting.  A  Klinger  DCS-750 
motor  controller  with  UE-72CC  positioner  was  used  to  precisely  position  the 
camera.  The  controller/positioner  combination  is  accurate  to  a  few  microns. 
On  the  optical  bench  roughly  a  meter  away  a  fiducial  was  mounted  so  that  it 
appeared  near  the  center  of  the  camera’s  field  of  view,  see  Figures  5-28  and 
5-29.  Experiments  were  performed  for  the  following  cases: 
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fiducial 


positioner 
and  camera 


Optical  Bench 


fiducial 


positioner 
and  camera 


Optical  Bench 


Figure  5-28:  Top  view  of  experimental  Figure  5-29:  Side  view  of  experimen- 
setup  for  centroid.  tal  setup  for  centroid. 


•  10  micron  and  100  micron  steps 

•  Motion  in  both  the  x  and  y  directions 

•  Fiducial  parallel  to  the  image  plane  and  at  a  45°  angle 

•  Light  and  dark  images 

For  each  experiment,  data  was  collected  at  26  camera  positions  along  the  rml. 
At  each  position,  100  images  were  collected.  The  mean  and  standard  devia¬ 
tion  were  calculated  for  each  position.  The  results  are  shown  in  Figures  5-30 
through  5-41.  The  mean  at  each  position  is  marked  by  an  “x”.  The  error  bars 
are  1  standard  deviation  above  and  below  the  mean.  The  line  is  the  least 
squares  best  fit  to  the  means.  Table  5.1  shows  both  the  theoretical  and  empir¬ 
ical  accuracy  of  the  centroid  calculation  for  three  conditions.  The  theoretical 
and  empirical  results  are  in  good  agreements. 


Condition 

D5mamic 

Range 

Major 

Axis 

Empirical 

Accuracy 

Theoretical 

Accuracy 

Bright,  fiat 

17.5 

0.065 

0.087 

Dark,  fiat 

6 

17.5 

0.15 

0.16 

45°  Angle 

17.5 

0.22 

0.16 

Table  5.1:  Empirical  and  theoretical  accuracy  for  centroid  calculations. 
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Figure  5-30:  Data  for  bright  image, 
camera  motion  in  the  x  direction  with 
100  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figiire  5-31:  Data  for  bright  image, 
camera  motion  in  the  x  direction  with 
10  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-32:  Data  for  bright  image, 
camera  motion  in  the  y  direction  with 
100  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-33:  Data  for  bright  image, 
camera  motion  in  the  y  direction  with 
10  micron  steps  and  fiducial  parallel 
to  the  image  plane. 
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Figure  5-34:  Data  for  dark  image, 
camera  motion  in  the  x  direction  with 
100  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-35;  Data  for  dark  image, 
camera  motion  in  the  x  direction  with 
10  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-36:  Data  for  dark  image, 
camera  motion  in  the  y  direction  with 
100  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-37:  Data  for  dark  image, 
camera  motion  in  the  y  direction  with 
10  micron  steps  and  fiducial  parallel 
to  the  image  plane. 
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Figure  5-38:  Data  for  bright  image, 
camera  motion  in  the  x  direction  with 
100  micron  steps  and  fiducial  45°  to 
the  image  plane. 


Figure  5-39:  Data  for  bright  image, 
camera  motion  in  the  x  direction  with 
10  micron  steps  and  fiducial  45°  to  the 
image  plane. 


Figure  5-40:  Data  for  bright  image, 
camera  motion  in  the  y  direction  with 
100  micron  steps  and  fiducial  45°  to 
the  image  plane. 


Figure  5-41:  Data  for  bright  image, 
camera  motion  in  the  y  direction  with 
10  micron  steps  and  fiducial  45°  to  the 
image  plane. 
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positioner 

fiducial  camera 


Optical  Bench 

Figvire  5-42:  Top  view  of  experimental  Figure  5-43:  Side  view  of  experimen- 
setup  for  semi-major  axis.  tal  setup  for  semi-major  axis. 

A  similar  set  of  experiments  were  conducted  for  the  semi-major  axis.  For 
this  set  of  experiments,  the  camera  was  mounted  with  its  optic  axis  parallel  to 
the  rail.  This  setup  allows  camera  motion  only  in  the  z  direction.  As  before 
a  fiducial  was  mounted  on  the  optical  bench  roughly  a  meter  away  so  that  it 
appeared  near  the  center  of  the  camera’s  field  of  view,  see  Figures  5-42  and 
5-43.  Experiments  were  performed  for  the  following  cases: 

•  Fiducial  parallel  to  the  image  plane  and  at  a  45°  angle 

•  Light  and  dark  images 

For  each  experiment,  data  was  collected  at  26  camera  positions  along  the  rail. 
At  each  position,  100  images  were  collected.  The  mean  and  standard  deviation 
were  calculated  for  the  major  axis  at  each  position.  The  least  squares  best  fit  to 
the  means  was  also  found.  The  results  are  shown  in  Figures  5-44  through  5-46. 
Table  5.2  shows  both  the  theoretical  and  empirical  accuracy  of  the  semi-major 
axis  calculation  for  three  conditions.  The  theoretical  and  empirical  results  are 
in  good  agreements. 


Optical  Bench 


fiducial 


positioner 
and  camera 


Condition 

D3mamic 

Range 

Major 

Axis 

Empirical 

Acciiracy 

Theoretical 

Accuracy 

Bright,  fiat 

40 

17.5 

0.15 

0.16 

Dark,  fiat 

4 

17.5 

0.23 

0.24 

45°  Angle 

50 

16.8 

0.24 

0.23 

Table  5.2:  Empirical  and  theoretical  accuracy  for  semi-major  axis  calcialations. 
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Figure  5-44:  Data  for  bright  image, 
camera  motion  in  the  z  direction  with 
2000  micron  steps  and  fiducial  parallel 
to  the  image  plane. 


Figure  5-45:  Data  for  dark  image, 
camera  motion  in  the  z  direction  with 
2000  micron  steps  and  fiducial  paral¬ 
lel  to  the  image  plane. 


Figure  5-46:  Data  for  bright  image, 
camera  motion  in  the  z  direction  with 
2000  micron  steps  and  fiducial  45°  to 
the  image  plane. 
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5.4  Discussion 

An  accurate,  efficient  and  robust  method  of  locating  fiducials  has  been  de¬ 
scribed.  A  thorough  error  analysis  of  the  method  has  been  provided  including 
both  theoretical  and  empirical  data.  This  data  confirms  that  in  the  worst  case 
fiducials  can  be  located  to  within  0.25  pixels  and  the  local  scale  factor  can  be 
determined  to  within  1.5%. 
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Chapter  6 

Initial  Calibration 


Before  our  fiducials  can  be  used  to  perform  enhanced  reality  visualizations, 
their  model  coordinates  must  be  known.  In  cases  were  the  model  coordinates  of 
the  fiducials  are  not  known  a  priori,  an  initial  calibration  must  be  performed. 
An  initial  alignment  is  performed  using  data  from  a  laser  scanner  [Crimson  et 
al.,  1994].  This  initial  alignment  along  with  an  image  showing  the  fiducials  is 
then  used  to  lookup  the  model  (MR  or  CT)  coordinates  of  the  fiducials.  Figure 
6-1  shows  an  overview  of  this  process.  Once  the  coordinates  of  the  fiducials 
have  been  established  the  laser  scanner  is  no  longer  needed  and  any  camera 
which  can  view  the  fiducials  can  be  used  for  enhanced  reality  visualization.  In 
this  chapter  we  provide  a  general  discussion  of  how  the  laser  scanner  works 
and  then  give  the  details  of  fiducial  calibration. 


Model  of  Patient’s 
Brain  and  Fiducials 

Figure  6-1:  Determining  the  model  (MR)  coordinates  of  the  fiducials. 


71 


72 


CHAPTERS.  INITIAL  CALIBRATION 


Obj€ct  being  scanned 


Laser 


Object  being  scanned 


Figure  6-2:  Side  view  of  scanner.  Figiare  6-3:  Object  and  laser  light 

plane  from  video  camera  perspec¬ 
tive. 


6.1  Laser  Scanner 

For  the  skull  data  shown  in  Chapter  7,  the  initial  calibration  was  performed 
with  the  aid  of  a  laser  range  scanner  produced  by  Technical  Arts  Corporation. 
It  produces  a  planar  sheet  of  light  which  is  scanned  using  an  oscillating  mirror. 
A  video  camera  is  placed  at  an  angle  to  the  plane  of  light,  see  Figure  6-2. 
The  X  axis  is  parallel  to  the  line  segment  joining  the  laser  and  video  camera. 
The  laser  is  oriented  so  that  the  line  of  illumination  formed  when  the  laser’s 
plane  of  light  strikes  an  object  is  perpendicular  to  the  x  axis.  The  y  axis  is 
parallel  to  the  line  of  illumination,  see  Figure  6-3.  The  z  axis  is  orthogonal 
to  both  the  x  and  y  axes.  The  y  axis  in  Figure  6-2  is  into  the  page  and  the  x 
axis  in  Figure  6-3  is  out  of  the  page.  The  three  dimensional  coordinates  of  an 
object’s  surface  can  be  determined  if  it  is  placed  so  that  it  can  be  simultaneously 
illuminated  by  the  laser  and  viewed  by  the  video  camera.  The  y  coordinate  of  a 
data  point  is  determined  directly  from  the  horizontal  displacement  measxored 
by  the  video  camera.  The  x  and  z  coordinates  are  recovered  using  the  vertical 
displacement  and  the  scan  angle  of  the  laser  beam.  A  single  scan  produces  240 
three  dimensional  measurements  with  accuracies  up  to  0.003”. 

Surface  points  on  the  patient’s  head  near  the  surgical  site  are  measured 
with  the  scanner.  These  data  points  are  aligned  with  a  model  of  the  patient’s 
head  and  brain  obtained  from  previous  MR  and/or  CT  scans  iGrimson  et  al, 
1994].  The  registration  is  produced  by  minimizing  the  sum  of  the  squared 
distance  between  the  laser  data  and  the  model.  The  model  is  sampled  at  several 
resolutions  to  speed  convergence  and  random  perturbations  are  used  to  avoid 
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Figure  6-4:  Model  of  patient’s  brain  Figure  6-5:  Patient,  scanner  and  coor 
with  coordinate  axes.  dinate  axes. 


Figure  6-6:  Model  of  patient’s  brain 
aligned  with  patient  (valid  only  for  the 
scanner). 


Figiire  6-7:  Model  of  a  skull. 
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Top  view 


Side  view 

Figure  6-8;  Laser  data  from  skull 


End  view 


Figure  6-9:  Video  of  skull  being  scanned. 
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Figure  6-10:  Laser  data  aligned  Figure  6-11:  Model  aligned  with 
with  skull.  video  of  the  same  skull. 

local  minima.  This  produces  a  transformation  from  the  coordinate  frame  of  the 
model  to  that  of  the  patient.  Figiire  6-4  shows  a  brain  model  and  its  coordinate 
system.  Figure  6-5  shows  the  patient,  scanner  and  their  coordinate  system. 
Figxire  6-6  shows  the  result  of  aligning  the  model  and  the  patient  coordinate 
systems.  It  should  be  noted  that  Figure  6-6  is  valid  only  from  the  perspective 
of  the  camera  attached  to  the  laser  scanner. 

Figiire  6-7  shows  a  model  obtained  from  CT  imaging  of  a  plastic  skull.  Figure 
6-8  shows  the  three  dimensional  data  obtained  from  laser  scanning  the  same 
skull.  Figure  6-9  shows  two  images  of  the  skull  containing  laser  scan  lines. 
Figure  6-10  shows  the  laser  data  aligned  with  the  skull.  Figure  6-11  shows  the 
model  of  the  skull  aligned  with  and  superimposed  upon  video  of  the  skull.  The 
accuracy  of  this  registration  is  believed  to  be  on  the  order  of  the  resolution  of 
the  MR  or  CT  data  (w  1mm). 


6.2  Calibration  Routine 

The  transformation  used  to  produce  Figure  6-11  essentially  maps  model  coor¬ 
dinates  to  image  coordinates.  This  is  exactly  the  inverse  of  what  we  need  to 
calibrate  our  fiducials.  What  we  would  like  to  be  able  to  do  is  measure  the  im¬ 
age  coordinates  of  the  fiducials  and  then  use  an  inverse  mapping  to  determine 
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their  model  coordinates.  The  inverse  mapping  is  not  in  general  a  one  to  one 
mapping.  The  image  coordinates  of  a  fiducial  determine  the  ray  in  three  space 
along  which  the  model  coordinates  of  the  fiducial  must  lie.  By  intersecting 
this  ray  with  the  model  and  Z-buffering  the  result  we  can  determine  the  model 
coordinate  of  the  fiducial. 

The  initial  calibration  is  limited  by  the  accuracy  of  the  registration  per¬ 
formed  using  the  laser  scanner.  The  laser  scanner  registration  produce  results 
which  are  both  repeatable  and  good  for  the  single  view  in  question.  It  is  not 
clear  how  accurate  the  registration  is  in  an  absolute  sense.  Absolute  accuracy 
is  important  for  calibrating  the  fiducials.  For  example,  small  errors  which  are 
imperceptible  from  one  point  of  view  frequently  lead  to  large  errors  from  dif¬ 
ferent  view  points.  Pertxmbing  the  laser  scanner  solution  by  less  than  1°  can 
change  a  fiducial’s  location  by  over  6mm.  The  uncertain  accuracy  of  the  fiducial 
calibration  bears  sigmficantly  on  the  quality  of  enhanced  reality  visualizations 
which  can  be  produced.  In  spite  of  this  issue,  the  results  shown  in  Chapter  7 
are  promising.  The  exact  source  and  nature  of  the  errors  in  the  initial  calibra¬ 
tion  needs  to  be  explored  further.  The  method  of  initial  calibration  presented 
here,  while  it  reqmres  the  use  of  a  laser  scanner,  has  the  advantage  that  it  can 
be  made  fully  automatic.  Of  course,  other  methods  of  initial  calibration  could 
be  used.  The  only  requirement  is  that  the  fiducial  locations  be  determined 
accurately.  How  this  information  is  obtained  does  not  matter. 


Chapter  7 
Results 


7.1  Test  Object 

To  determine  the  accuracy  of  our  method  we  performed  several  experiments 
using  a  special  test  object.  A  three  dimensional  object  with  seven  fiducials 
was  made.  The  relative  positions  of  the  fiducials  were  accurately  measured. 
Figure  7-1  shows  the  basic  test  object.  The  large  pillar  near  the  center  is  used 
to  measure  the  accuracy  of  the  enhanced  reality  visualization.  A  wire  frame 
corresponding  to  the  edges  of  the  pillar  is  displayed  in  the  enhanced  reality 
image.  The  difference  between  the  actual  edges  and  wire  frame  is  a  measxire 
of  the  accuracy  of  the  visualization.  For  comparison  purposes  a  wire  frame  is 
also  superimposed  on  a  shorter  pillar.  Experiments  were  performed  with  four 
slightly  different  fiducial  configurations.  The  first  configuration  consists  of  6 
coplanar  fiducials  plus  a  single  fiducial  1cm  above  the  plane.  The  enhanced 
reality  visualizations  produced  from  this  configuration  are  not  consistently  ac¬ 
curate.  Figm-es  7-2  through  7-5  are  tj^ical  of  the  resvdts  for  this  configuration. 
Notice  that  the  wire  frame  on  the  smaller  pillar  matches  in  all  of  the  images. 
Errors  occur  only  significantly  away  from  the  volume  enclosed  by  the  fiducials. 
The  second  configuration  moves  one  of  the  6  coplanar  fiducials  so  that  it  is  also 
1cm  above  the  plane.  Figvues  7-6  through  7-8  show  typical  results  for  this 
configuration.  The  accuracy  is  greatly  improved  with  the  addition  of  a  second 
noncoplanar  fiducial,  however  occasional  slight  errors  do  occur.  The  next  config¬ 
uration  adds  a  third  noncoplanar  fiducial.  Typical  results  for  this  configxuation 
are  shown  in  Figures  7-9  and  7-10.  With  this  configuration,  errors  rarely  occur 
and  when  they  do  they  are  small.  The  final  configuration  is  similar  to  the  first 
except  that  the  noncoplanar  fiducial  is  4cm  out  of  the  plane.  Results  for  this 
configuration  are  shown  in  Figures  7-11  and  7-12.  The  accuracy  of  this  config¬ 
uration  is  comparable  to  that  of  the  last.  Some  depth  in  the  model  is  required 
to  accurately  recover  the  third  dimension.  Once  sufficient  depth  is  present  in 
the  model,  it  appears  that  the  solution  is  accurate  over  a  large  volume. 

As  shown  in  Figures  7-2  through  7-12,  under  reasonable  conditions  our 
method  produces  good  results.  These  figures  provide  only  a  subjective  measure 
of  acctuacy.  Next  we  will  attempt  to  provide  a  more  quantitative  meastue. 
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Figure  7-3:  Test  object  view  2.  Figtire  7-4:  Test  object  view  3 
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Figxore  7-6:  Test  object  view  5 


Figure  7-5:  Test  object  view  4 
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Figure  7-7:  Test  object  view  6. 


Figure  7-8:  Test  object  view  7 


Figure  7-9:  Test  object  view  8.  Figure  7-10:  Test  object  view  9 
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Average 

Distance 

RMS 

Distance 

Maximum 

Distance 

Median 

Distance 

Number  of 
Points 

1.35 

1.46 

2.94 

1.38 

848 

Table  7.1:  Distance  between  edge-based  and  enhanced  reality  vertices. 


Average 

Misalignment 

RMS 

Misalignment 

Maximum 

Misalignment 

Median 

Misalignment 

Number  of 
Points 

0.88 

0.91 

1.77 

0.87 

212 

Table  7.2:  Misalignment  between  edge-based  and  enhanced  reality  polygons. 


Given  that  the  goal  of  enhanced  reality  visualization  is  to  produce  an  enhanced 
image,  the  proper  way  to  assess  accuracy  is  to  consider  the  deviation  of  the 
enhanced  image  from  the  ideal  image.  For  various  reasons  we  do  not  have 
access  to  the  ideal  image.  However,  we  can  compare  the  deviation  between 
the  image  of  a  physical  object  and  the  enhancement  produced  from  a  model  of 
the  same  physical  object.  For  example,  we  could  compare  the  wire  frame  to 
the  edges  of  the  test  object’s  central  pillar.  This  is  essentially  what  we  will  do. 
Figure  7-13  shows  the  same  test  object  used  above  with  one  change,  the  top  of 
the  central  pillar  had  been  darkened  to  provide  high  contrast  edges.  Edge  pixels 
are  extracted  and  chained  together  using  a  Canny  edge  detector.  A  line  is  fitted 
to  the  chains  by  minimizing  the  distance  between  the  edge  pixels  and  the  line. 
These  lines  are  used  to  compute  two  measures  of  accuracy.  The  first  measure  is 
the  distance  between  the  vertices  of  the  darkened  region  as  determined  by  our 
method  (enhanced  reality  visualization)  and  those  determined  by  intersecting 
the  lines  recovered  above.  The  second  measure  is  the  area  of  misalignment 
divided  by  the  perimeter  or  the  average  distance  between  the  two  polygons,  see 
Figure  7-14. 

A  sequence  of  212  images  of  the  test  object  rotating  through  360°  was  used. 
The  first  measirre  was  computed  for  each  of  the  four  vertices  in  each  image. 
The  distance  between  the  vertices  in  each  image  are  shown  in  Figures  7-15 
through  7-18.  The  second  measure  was  computed  for  the  darkened  polygon 
in  each  image  and  the  average  distance  between  the  polygons  for  each  image 
is  shown  in  Figure  7-19.  Tables  7.1  and  7.2  summarize  these  results.  The 
edge-based  positions  agree  well  with  those  produced  by  our  method.  It  should 
be  noted  that  edges  and  vertices  recovered  using  the  Canny  edge  detector  are 
not  without  error.  The  most  significant  source  of  error  is  the  fact  that  the 
implementation  used  only  localizes  the  edge  to  the  nearest  pixel.  As  a  result  of 
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Figiire  7-13;  Test  object  used  to 
quantify  accuracy. 


image 


Figure  7-15:  Distance  in  pixels  be¬ 
tween  edge-based  and  enhanced  real¬ 
ity  positions  for  vertex  1. 


Figure  7-14:  Misalignment  of 
edge-based  rectangle  and  en¬ 
hanced  reality  rectangle. 


image 


Figure  7-16:  Distance  in  pixels  be¬ 
tween  edge-based  and  enhanced  real¬ 
ity  positions  for  vertex  2. 
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Figure  7-17:  Distance  in  pixels  be¬ 
tween  edge-based  and  enhanced  real¬ 
ity  positions  for  vertex  3. 


Figure  7-18:  Distance  in  pixels  be 
tween  edge-based  and  enhanced  real 
ity  positions  for  vertex  4. 


Figiire  7-19:  Average  distance  be¬ 
tween  edge-based  and  enhanced  real¬ 
ity  polygons. 
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this,  the  first  measure  probably  over  estimates  the  difference  between  the  edge 
image  and  the  enhanced  image.  A  signed  measm-e  of  the  distance  from  point  to 
line^  was  also  computed  to  check  for  correlation  in  the  errors.  The  average  for 
the  first  measure  was  0.09  pixels  and  for  the  second  measirre  was  0.04  pixels 
making  it  unlikely  that  the  errors  in  edge-based  positions  are  correlated  with 
those  in  the  enhanced  reality  positions.  The  average  difference  between  the 
two  perimeters  is  likely  the  better  measure  of  accuracy  and  using  this  measime 
our  method  is  acctrrate  to  within  a  pixel. 


7.2  Skull 

After  calibrating  the  fiducials  as  described  in  Chapter  6,  enhanced  reality  vi¬ 
sualizations  were  performed  from  several  different  view  points.  Figure  7-20 
shows  the  initial  view  of  a  plastic  skull.  Figure  7-21  shows  the  results  of  the 
registration  using  the  laser  scanner.  The  white  dots  are  CT  data  points  for  the 
skull  superimposed  upon  an  image  of  the  skull.  Figiue  7-22  shows  the  results 
of  our  method  using  the  initial  view  point.  Note  that  as  expected  the  error  in 
the  Figures  7-21  and  7-22  are  comparable.  Figxires  7-23  through  7-32  show  the 
results  of  our  method  using  ten  novel  view  points.  The  exact  source  of  the  mis¬ 
alignment  present  in  some  of  the  figures  is  not  known.  Two  likely  sources  are 
the  initial  calibration  and  the  fact  that  the  fiducials  are  nearly  coplanar.  The 
errors  are  largest  for  view  points  sigmficantly  different  from  that  used  during 
the  initial  calibration  which  suggests  that  at  least  some  of  the  misalignment 
is  caused  by  errors  introduced  during  the  initial  calibration.  Also,  as  noted  in 
Chapter  6  small  perturbations  in  the  laser  scanner  alignment  cause  relatively 
large  variations  in  the  calibrated  positions  of  the  fiducials.  A  more  robust  initial 
calibration  and  a  method  of  handling  coplanar  fiducials,  which  together  should 
eliminate  these  errors,  are  currently  being  investigated. 


^The  sign  indicates  which  side  of  the  line  the  point  is  on. 
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Figure  7-23:  Skull  view  1 


Figure  7-24;  Skull  view  2 


Figure  7-25:  Skull  view  3 


Figure  7-26:  Skull  view  4 


Figiare  7-28:  Skull  view  6 


Figure  7-27:  Skull  view  5 


Figure  7-30:  Skull  view  8 


Figure  7-29:  Skull  view  7 
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Conclusions 


8.1  Future  Work 

There  are  many  improvements  and  extensions  which  can  be  made  to  the  basic 
method  presented  in  this  report.  Several  of  them  have  been  alluded  to  in 
previous  chapters. 

8. LI  Auto  calibration 

The  cxurent  implementation  requires  knowledge  of  the  ratio  of  pixel  spacing  in 
the  X  and  y  directions,  s^/y  in  order  to  recover  s.  While  s^/y  is  extremely  stable 
and  determining  its  value  is  straight  forward,  it  would  be  nice  to  eliminate 
this  reqxiirement.  Given  enough  noncoplanar  fiducials  it  should  be  possible  to 
solve  for  Solving  for  s^/y  might  be  fairly  time  consuming  but  it  should  only 
need  to  be  performed  once.  Similarly,  it  would  be  nice  to  be  able  to  handle  lens 
distortion  (although  we  have  not  found  it  necessary).  It  is  a  simple  matter  to 
add  a  lens  distortion  model  as  a  preprocessor.  The  fiducial  locations  are  simply 
corrected  by  the  amount  specified  by  the  model.  In  some  cases  considering  dis¬ 
tortion  would  svu’ely  improve  the  results.  Unfortimately  this  reqmres  finding 
the  proper  distortion  model.  There  are  several  well  established  methods  for 
determining  lens  distortion.  Ideally,  the  model  would  be  determined  automat¬ 
ically.  Lens  distortion  changes  with  several  of  the  other  camera  parameters 
such  as  aperture,  focus  and  zoom.  Even  so,  given  several  data  sets  consisting 
of  image  points,  model  points,  perspective  transformation,  aperture,  focus  and 
zoom  settings  we  should  be  able  to  construct  a  model  which  takes  into  accoimt 
the  effect  of  changes  to  apertiu-e,  focus  and  zoom.  It  should  be  necessary  to 
construct  a  new  model  or  modify  an  old  model  only  occasionally.  Oxu’  method 
as  it  is  currently  implemented  models  a  linear  approximation  to  lens  distor¬ 
tion  implicitly.  Automatically  determining  a  value  for  s^/y  and  a  model  for  lens 
distortion  are  two  examples  of  auto  calibration.  Auto  calibration  would  bridge 
the  gap  between  what  is  implicitly  modelable  and  other  required  or  desired 
parameters.  The  parameters  which  are  implicitly  modeled  are  those  which 
can  change  from  one  image  to  the  next.  The  parameters  which  are  candidates 
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for  auto  calibration  are  either  fixed  or  vary  on  much  longer  time  scales.  This 
makes  it  possible  to  run  an  auto  calibration  routine  in  the  background  updating 
parameters  as  necessary  but  certainly  not  every  frame.  Auto  calibration  would 
enable  a  completely  uncalibrated  camera  with  a  poor  quality  lens  to  be  used  to 
produce  very  accurate  enhanced  reality  visualizations. 

8.1.2  General  Features  and  Self  Extending  Models 

Where  auto  calibration  enables  camera  parameters  to  be  recovered,  self  ex¬ 
tending  models  allow  recovery  of  model  parameters.  Given  a  partial  model  and 
several  solutions  from  different  view  points  it  should  be  possible  to  recover  the 
three  dimensional  location  of  points  not  in  the  model  but  which  are  visible. 
This  effectively  extends  the  model.  The  ability  to  use  general  features  in  place 
of  fiducials  would  be  a  significant  improvement  for  many  potential  applications 
and  makes  self  extending  models  truly  useful.  The  circular  fiducial  currently 
used  facilitates  recovery  of  depth  information.  The  same  kind  of  size  informa¬ 
tion  can  potentially  be  recovered  from  naturally  occuring  spatial  features  such 
as  patches  of  texture,  etc.  Another  possible  option  is  to  use  stereo  to  recover 
depth  information.  Since  relative  depth  is  all  that  is  required  the  stereo  cam¬ 
eras  need  not  be  calibrated.  Using  a  stereo  setup  would  eliminate  the  need  to 
know  5x/y  and  would  facilitate  a  stereo  display. 

8.L3  Miscellaneous 

Several  other  more  modest  improvements  or  extensions  also  exist.  As  noted  in 
Chapters  4  and  7,  noncoplanar  points  significantly  improve  the  results  of  our 
method.  With  a  slight  modification  to  our  method  it  should  be  possible  to  use 
planar  data  exclusively.  The  modification  involves  solving  a  quartic  equation. 
The  ability  to  frmction  when  all  of  the  fiducials  are  coplanar  would  certainly  be 
an  asset.  It  is  not  clear  what  accuracy  can  be  achieved  by  this  approach.  As 
noted  in  Chapter  6  there  is  room  for  improvement  in  the  initial  calibration  of 
fiducials.  At  a  minimmn  a  better  understanding  of  the  soixrce  and  nature  of 
errors  is  needed.  A  more  robust  method  of  calibrating  fiducials  would  also  be 
very  useful.  Finally,  the  cmrent  implementation  can  be  optimized  significantly 
for  speed. 


8.2  Applications 

Neurosurgery  is  just  one  application  for  enhanced  reality  visualization.  There 
are  numerous  other  potential  applications  both  inside  and  outside  the  medical 
field.  These  applications  range  from  manufacturing  and  repair  to  navigation 
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and  rescue.  Enhanced  reality  visualization  can  be  most  readily  applied  in 
domains  where  models  either  already  exist  or  are  easily  obtainable.  In  the 
medical  field  models  are  easily  obtained  using  internal  anatomy  scanners  such 
as  MR  and  CT.  Models  already  exist  for  many  repair  and  manufacturing  en¬ 
vironments.  Before  enhanced  reality  visualization  will  be  accepted  for  routine 
use  an  accurate  and  robust  method  of  performing  the  visualization  must  be 
developed.  Our  method  makes  significant  progress  towards  this  goal.  Better 
methods  of  generating  models  will  make  it  practical  to  apply  enhanced  reality 
visualization  to  many  more  situations.  Because  it  is  anchored  in  the  real  world 
it  has  the  potential  to  affect  our  everyday  lives.  For  example,  enhanced  reality 
visualization  can  be  used  to  transcend  the  limitations  of  computer  momtors.  By 
moving  the  display  off  the  desk  and  into  a  visor  or  pair  of  goggles  the  di^lay 
becomes  much  more  useful.  Multiple  virtual  screens  can  be  defined.  These 
virtual  screens  are  anchored  in  the  real  world  so  the  user,  rather  than  fi^bhng 
with  a  mouse  to  expose  the  desired  window,  can  simply  look  around  and  exam¬ 
ine  the  contents  of  the  various  virtual  screens.  In  effect  the  size  of  the  display 
becomes  unlimited.  Since  the  virtual  screens  are  anchored  in  the  real  world 
they  are  spatially  organized  which  is  a  powerful  orgamzational  metaphor,  h  or 
example,  you  can  define  a  virtual  screen  containing  a  phone  list  and  attach  it 
to  your  bulletin  board.  Whenever  you  need  to  check  a  phone  number  on  the  list 
all  that  need  be  done  is  look  at  the  bulletin  board.  The  phone  list  will  always 
be  where  you  posted  it. 


8.3  Discussion 

A  new  method  for  performing  enhanced  reality  visualization  has  been  devel¬ 
oped  The  method  achieves  good  results  using  just  a  few  fiducials  placed  near 
the  volume  of  interest.  Noncoplanar  fiducials  yield  better  results.  Our  inethod 
allows  for  motion  and  automatically  corrects  for  changes  to  the  internal  cam¬ 
era  parameters  (focal  length,  focus,  aperture,  etc.)  making  it  particularly  well 
suited  to  enhanced  reality  visualization  in  dynamic  environments.  In  a  surgical 
application,  we  place  a  few  fiducials  placed  near  the  surgical  site  immediate  y 
prior  to  surgery.  An  initial  calibration  is  performed  using  a  laser  scanner.  Al¬ 
ter  the  calibration  is  performed,  our  method  is  fully  automatic,  runs  in  nearly 
real-time  and  is  accurate  to  a  fraction  of  a  pixel  and  reqmres  only  a  single 

image. 
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Appendix  A 

Effects  of  Radial  Distortion 


As  discussed  in  previous  chapters,  our  method  only  models  a  linear  approxima¬ 
tion  to  radial  distortion.  In  this  appendix  we  will  consider  the  consequences  of 
this  approximations.  Radial  distortion  is  typically  modeled  as  follows  [Slama, 
1980]; 


(A.1) 

y'u  =  y'd  +  ^y 

(A.2) 

Sx  =  {x'i  —  xo)  +  K2r'‘^ 

(A.3) 

h  =  (y'd  -  2/0)  {Kir'l  +  iir2r'd) 

(A.4) 

Where  x'^  and  yj,  are  the  undistorted  pixel  coordinates,  x'^  and  y'^  are  the  dis¬ 
torted  pixel  coordinates  with  r'\  —  (x'd  —  ®o)^  +  {v'i  ~  Vo)  >  -^2  are 

the  radial  distortion  coefficients.  If  Ki  and  K2  are  known,  it  is  quite  simple 
to  use  this  radial  distortion  model  to  correct  the  data  before  passing  it  to  our 
method.  However,  what  if  ifi  and  K2  are  not  known?  The  linear  approximation 
contained  in  oxrr  method  works  well  if  radial  distortion  is  not  too  great.  It  also 
works  well  if  the  fiducials  are  near  the  location  of  the  enhancement. 

As  a  preliminary  step,  we  measured  the  radial  distortion  for  our  camera 
setup.  Radial  distortion  was  measTxred  using  the  pl\unb-line  method  [Brown, 
1971,  Stein,  1993]  for  a  16mm  and  25mm  lens.  The  results  are  summarized  in 
Table  A.l.  Plots  of  the  distortion  are  also  shown  in  Figiues  A-1  and  A-2. 

In  order  to  gain  some  insight  into  the  effect  of  radial  distortion  on  our  method 
we  will  attempt  to  quantify  the  difference  between  the  radial  distortion  model 


Lens 

So 

yo 

Ki 

K2 

16mm 

336 

246 

2.0e-8 

2.0e-13 

25mm 

332 

248 

-4.0e-8 

l.Oe-13 

Table  A.1:  Radial  distortion  parameters  for  a  16mm  and  25mm  lens. 


93 


94 


APPENDIX  A  EFFECTS  OF  RADIAL  DISTORTION 


described  above  and  the  linear  approximation  included  in  our  method.  To  quan¬ 
tify  the  difference,  a  set  of  5  three  dimensional  control  points  K  were  selected 
and  a  perspective  transformation  V  (composed  of  a  rigid  transformation  and  a 
camera  calibration  matrix  as  described  in  Chapter  4)  was  defined.  The  control 
points  were  then  projected  onto  the  image  plane  using  (4.2)  producing  a  set  of 
undistorted  image  points  /u.  The  undistorted  image  points  were  then  distorted 
hy  Sd=  [8x  8y]  based  on  the  radial  distortion  measured  above  producing  a  set  of 
distorted  image  points  Jj.  Now  we  have  a  set  of  control  (model)  points  and  a  set 
of  distorted  image  points.  This  is  exactly  the  information  that  our  method  uses 
to  compute  a  perspective  transformation.  We  compute  a  new  perspective  trans¬ 
formation  V'  using  (4.8)  through  (4.10).  The  effect  of  the  linear  approximation 
on  an  arbitrary  three  dimensional  point  M  can  be  expressed  as  follows: 


V  =  \\{MV +  8d)- MV'W^  (A.5) 

Figures  A-3  and  A-4  show  two  plots  of  v  for  a  plane  parallel  to  the  image 
plane  and  passing  through  the  control  points.  The  five  spikes  mark  the  image 
locations  of  the  five  control  points.  Notice  that  globally  v  is  no  worse  than  the 
raw  radial  distortion.  Further,  v  is  small  in  the  vicinity  of  the  control  points 
even  when  the  control  points  are  located  at  the  edge  of  the  image.  These  two 
characteristics  were  true  for  all  of  the  simulations  that  we  ran.  This  confirms 
the  claim  that  oiir  method  works  well  if  the  radial  distortion  is  not  too  great  or 
the  enhancement  is  close  to  the  fiducials. 

Finally,  we  performed  an  experiment  using  a  lens  with  significant  distortion. 
Figure  A-5  shows  an  image  taken  using  a  4.8mm  lens.  The  test  object  is  the 
same  one  which  appeared  in  earlier  figures.  The  distortion  is  readily  apparent. 
Figures  A-6  through  A-8  show  the  results  of  oiir  method  without  correcting  for 
distortion. 

In  general,  a  more  accvuate  model  produces  more  accurate  results.  Correct¬ 
ing  for  radial  distortion  undoubtedly  would  improve  the  results  of  our  method. 
However,  as  shown  in  this  appendix  for  enhanced  reality  visualization  using 
reasonable  cameras  in  realistic  viewing  situations  the  improvement  is  almost 
insignificant. 


Figure  A-1:  Radial  distortion  in  pixels  Figiu'e  A-2:  Radial  distortion  in  pixels 
for  a  16mm  lens.  for  a  25mm  lens. 


Figure  A-3:  Effect  of  radial  distortion  Figure  A-4:  Effect  of  radial  distortion 
on  our  method  with  the  fiducials  near  on  our  method  with  the  fiducials  near 
the  center  of  the  image.  the  edge  of  the  image. 
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Figure  A-7:  Distorted 
view  2. 


image  Figure  A-8:  Distorted 
view  3. 
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Figure  A- 5:  Image  with 
cant  distortion. 

s 

igmfi-  Figure  A-6:  Distorted  image 
view  1. 
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